Two preg_replace Escaping Gotchas
November 13th, 2005preg_replace is a major workhorse function in PHP. Unfortunately, there are some less than obvious issues with using it properly. Here are two:
The e modifier causes the replacement value of preg_replace (including backreferences) to be evaluated as PHP code. This is a powerful capability. If you’ve ever seen an SQL injection, this sounds dangerous. It would be, too, but PHP automatically escapes any backreferences before building the string to evaluate. So this is safe:
$input = ‘” . die() . “‘;
var_dump(preg_replace(‘|^(.*)$|e’, ‘”\1″‘, $input));
// output: string(13) “” . die() . “”
However, if you use double quotes inside your replacement string, as in this example, then variable parsing is still active. This can lead to problems with syntax errors or with PHP variable values inserted into the string:
$password = ’secret’;
$input = ‘$password’;
var_dump(preg_replace(‘|^(.*)$|e’, ‘”\1″‘, $input));
// output: string(6) “secret”
So obviously, we want to be careful to use single quotes to avoid variable parsing. However, we aren’t done with single quotes. preg_replace doesn’t know which quote style you use, so it escapes both of them. That means that if your input actually does contain a quote, your quote will gain an unwanted slash. How many times have you seen bad php code do that?
$input = ‘”‘;
var_dump(preg_replace(‘|^(.*)$|e’, “‘\\1′”, $input));
// output: string(2) “\”"
A naive solution might run the value through stripslashes to fix that, but if your input actually has a slash in it, it will be unexpectedly removed:
$input = ‘\\’;
var_dump(preg_replace(‘|^(.*)$|e’, “stripslashes(’\\1′)”, $input));
// output: string(0) “”
So what is the best solution? Well, in my book, it is to use preg_replace_callback and avoid preg_replace on e altogether. This has the dual advantage of avoiding all the escaping issues and also not triggering an eval on every call if you happen to be in a loop.
Second, most users of the preg_ functions are familiar with preg_quote for escaping strings to use them as literals in regular expression patterns. However, many people don’t realize that the replacement parameter of preg_replace also has special characters:
$input = ‘$5 dollars’;
$replacement = ‘$10′;
var_dump(preg_replace(‘|^(.*) dollars$|’, $replacement . ‘ dollars’, $input));
// output: string(8) ” dollars
Where did the $10 go? Well, it got turned into backreference 10, which was empty. A naive solution would be to use preg_quote:
$input = ‘$5 dollars’;
$replacement = ‘$10+$5′;
var_dump(preg_replace(‘|^(.*) dollars$|’, preg_quote($replacement) . ‘ dollars’, $input));
// output: string(15) “$10\+$5 dollars”
But now we’ve got that spare slash that tells the world that this code ain’t quite right. The reason for this is that the characters that are special in the replacement value of preg_replace are not the same characters that are special in the pattern. So here is a solution:
function preg_replacement_quote($str) {
return preg_replace(‘/(\$|\\\\)(?=\d)/’, ‘\\\\\1′, $str);
}
$input = ‘$5 dollars’;
$replacement = ‘$10+$5′;
var_dump(preg_replace(‘|^(.*) dollars$|’, preg_replacement_quote($replacement) . ‘ dollars’, $input));
// output: string(14) “$10+$5 dollars”
Now we get the expected output. Proper data handling is a good thing.
November 14th, 2005 at 1:59 am
I wrote a small paper on how the “e” modifier can be abused by attackers a couple of weeks ago ( http://hauser-wenz.de/playground/papers/RegExInjection.pdf ). I was prompted to do that because I was discussing in some security talks this year whether upcoming attacks like “XPath Injection” are ridiculous or a real threat. I rather thought of the former, but then I found the “e” modifier in a real-world application I audited earlier this year, *ouch*.
Nice examples, btw!
November 17th, 2005 at 2:04 am
Sorry for unrelated comment, but thanks to your site I found out that someone samelessly stole article from my site. Not you
You have “recent bookmarks” sidebar with link “Why is PHP a Pain? installing php applications” and linked article http://www.designbytim.com/blog/2005/10/27/22/ is actually stolen from my blog http://blog.enargi.com/programming/php/why-is-it-so-hard/ . Unbeliveable. I never saw someone actually stealing articles and especially on the subject of professional PHP.
February 11th, 2008 at 2:22 pm
chmod(IMAGES.”avatars/”.$avatarname,0644);
November 27th, 2008 at 9:19 am
hey!
xxoxo
I made on photoshop glitter myspace banners.
have a look at them:
http://tinyurl.com/5bxl7f
Thanks a lot for your site