preg_replace is a major workhorse function in PHP. Unfortunately, there are some less than obvious issues with using it properly. Here are two:
The e modifier causes the replacement value of preg_replace (including backreferences) to be evaluated as PHP code. This is a powerful capability. If you’ve ever seen an SQL injection, this sounds dangerous. It would be, too, but PHP automatically escapes any backreferences before building the string to evaluate. So this is safe:
$input = '" . die() . "';
var_dump(preg_replace('|^(.*)$|e', '"\1"', $input));
// output: string(13) "" . die() . ""
However, if you use double quotes inside your replacement string, as in this example, then variable parsing is still active. This can lead to problems with syntax errors or with PHP variable values inserted into the string:
$password = 'secret';
$input = '$password';
var_dump(preg_replace('|^(.*)$|e', '"\1"', $input));
// output: string(6) "secret"
So obviously, we want to be careful to use single quotes to avoid variable parsing. However, we aren’t done with single quotes. preg_replace doesn’t know which quote style you use, so it escapes both of them. That means that if your input actually does contain a quote, your quote will gain an unwanted slash. How many times have you seen bad php code do that?
$input = '"';
var_dump(preg_replace('|^(.*)$|e', "'\\1'", $input));
// output: string(2) "\""
A naive solution might run the value through stripslashes to fix that, but if your input actually has a slash in it, it will be unexpectedly removed:
$input = '\\';
var_dump(preg_replace('|^(.*)$|e', "stripslashes('\\1')", $input));
// output: string(0) ""
So what is the best solution? Well, in my book, it is to use preg_replace_callback and avoid preg_replace on e altogether. This has the dual advantage of avoiding all the escaping issues and also not triggering an eval on every call if you happen to be in a loop.
Second, most users of the preg_ functions are familiar with preg_quote for escaping strings to use them as literals in regular expression patterns. However, many people don’t realize that the replacement parameter of preg_replace also has special characters:
$input = '$5 dollars';
$replacement = '$10';
var_dump(preg_replace('|^(.*) dollars$|', $replacement . ' dollars', $input));
// output: string(8) " dollars
Where did the $10 go? Well, it got turned into backreference 10, which was empty. A naive solution would be to use preg_quote:
$input = '$5 dollars';
$replacement = '$10+$5';
var_dump(preg_replace('|^(.*) dollars$|', preg_quote($replacement) . ' dollars', $input));
// output: string(15) "$10\+$5 dollars"
But now we’ve got that spare slash that tells the world that this code ain’t quite right. The reason for this is that the characters that are special in the replacement value of preg_replace are not the same characters that are special in the pattern. So here is a solution:
function preg_replacement_quote($str) {
return preg_replace('/(\$|\\\\)(?=\d)/', '\\\\\1', $str);
}
$input = '$5 dollars';
$replacement = '$10+$5';
var_dump(preg_replace('|^(.*) dollars$|', preg_replacement_quote($replacement) . ' dollars', $input));
// output: string(14) "$10+$5 dollars"
Now we get the expected output. Proper data handling is a good thing.
I wrote a small paper on how the “e” modifier can be abused by attackers a couple of weeks ago ( http://hauser-wenz.de/playground/papers/RegExInjection.pdf ). I was prompted to do that because I was discussing in some security talks this year whether upcoming attacks like “XPath Injection” are ridiculous or a real threat. I rather thought of the former, but then I found the “e” modifier in a real-world application I audited earlier this year, *ouch*.
Nice examples, btw!
Sorry for unrelated comment, but thanks to your site I found out that someone samelessly stole article from my site. Not you
You have “recent bookmarks” sidebar with link “Why is PHP a Pain? installing php applications” and linked article http://www.designbytim.com/blog/2005/10/27/22/ is actually stolen from my blog http://blog.enargi.com/programming/php/why-is-it-so-hard/ . Unbeliveable. I never saw someone actually stealing articles and especially on the subject of professional PHP.
[...] I’ve been reading Professional PHP for a few weeks now and I’m really enjoying it. It is one of the few blogs out there that actually writes about code and coding in general. For example, their latest post gives a quick overview of some gotchas with the php function preg_replace. Professional PHP gives some good pointers about how to remove those unwanted extra slashes and things of that nature. [...]
[...] Another read, specific to escaping regular expressions and the types of security holes you might fall into with preg_replace(), is Jeff’s explanation of two preg_replace() escaping gotcha’s, which describes the exact nature of the problem plus provides a solution to escaping replacement strings. [...]
[...] a good source on preg_replace, why you should always use single quotes, common mistakes, and why you should really just avoid [...]
Can u please make function preg_replacement_quote for double quoted strings? (Mail is not fake!)
My partner and I stumbled over from a alternate page plus thought I might check it out on Two preg_replace Escaping Gotchas – Professional PHP . We like what I notice so I am just a fan. Look forward to checking out your web site repeatedly… FYI whats the latest on Libya amazing information what do you reckon … All the best Rob Rasner IMDB
This is a very old blog entry, but I’d like to thank the author for the preg_replacement_quote function. Something like this should definitely be in the PHP library. Thanks again!
I genuinely similar to this blog, make sure you don’t quit!
This post provides the light in which we are able to observe the reality. That is very good a single and offers in-depth information.
imbd…
[...]Two preg_replace Escaping Gotchas – Professional PHP[...]…
Spring, also used car mount sterling kentucky, also 8[, also irs 2002 tax refunds, also fptw, also ringtone true tone, also P, also business il insurance life small, also %[[, also sunset beach north carolina rental, also (, also home for sale in pinson alabama, also 8-), also irondale industrial contractors, also vjugnb, also easy small business loan, also :-O, also
Odd this kind of publish is actually totaly unimportant towards the research query I entered on the internet but it has been in initial site. Who is actually Common Failure, and just he reading my hard disk drive? Related to Steven Wright
Thank you!!!
Whoops I’m Retarded.
This is on the list of optimum posts that I’ve ever noticed; you may include some much more concepts in the similar theme. I’m nonetheless waiting for some exciting thoughts from your side in your next post.