Professional PHP

PHP Programming, Web Development, PHP Advocacy and PHP Best Practices.
« Communicating a Vision with Open Source
A Glimpse into the Future: PHP 6 »

Two preg_replace Escaping Gotchas

November 13th, 2005

preg_replace is a major workhorse function in PHP. Unfortunately, there are some less than obvious issues with using it properly. Here are two:

The e modifier causes the replacement value of preg_replace (including backreferences) to be evaluated as PHP code. This is a powerful capability. If you’ve ever seen an SQL injection, this sounds dangerous. It would be, too, but PHP automatically escapes any backreferences before building the string to evaluate. So this is safe:

$input = '" . die() . "';
var_dump(preg_replace('|^(.*)$|e', '"\1"', $input));
// output: string(13) "" . die() . ""

However, if you use double quotes inside your replacement string, as in this example, then variable parsing is still active. This can lead to problems with syntax errors or with PHP variable values inserted into the string:

$password = 'secret';
$input = '$password';
var_dump(preg_replace('|^(.*)$|e', '"\1"', $input));
// output: string(6) "secret"

So obviously, we want to be careful to use single quotes to avoid variable parsing. However, we aren’t done with single quotes. preg_replace doesn’t know which quote style you use, so it escapes both of them. That means that if your input actually does contain a quote, your quote will gain an unwanted slash. How many times have you seen bad php code do that?

$input = '"';
var_dump(preg_replace('|^(.*)$|e', "'\\1'", $input));
// output: string(2) "\""

A naive solution might run the value through stripslashes to fix that, but if your input actually has a slash in it, it will be unexpectedly removed:

$input = '\\';
var_dump(preg_replace('|^(.*)$|e', "stripslashes('\\1')", $input));
// output: string(0) ""

So what is the best solution? Well, in my book, it is to use preg_replace_callback and avoid preg_replace on e altogether. This has the dual advantage of avoiding all the escaping issues and also not triggering an eval on every call if you happen to be in a loop.

Second, most users of the preg_ functions are familiar with preg_quote for escaping strings to use them as literals in regular expression patterns. However, many people don’t realize that the replacement parameter of preg_replace also has special characters:

$input = '$5 dollars';
$replacement = '$10';
var_dump(preg_replace('|^(.*) dollars$|', $replacement . ' dollars', $input));
// output: string(8) " dollars

Where did the $10 go? Well, it got turned into backreference 10, which was empty. A naive solution would be to use preg_quote:

$input = '$5 dollars';
$replacement = '$10+$5';
var_dump(preg_replace('|^(.*) dollars$|', preg_quote($replacement) . ' dollars', $input));
// output: string(15) "$10\+$5 dollars"

But now we’ve got that spare slash that tells the world that this code ain’t quite right. The reason for this is that the characters that are special in the replacement value of preg_replace are not the same characters that are special in the pattern. So here is a solution:

function preg_replacement_quote($str) {
    return
preg_replace('/(\$|\\\\)(?=\d)/', '\\\\\1', $str);
}

$input = '$5 dollars';
$replacement = '$10+$5';
var_dump(preg_replace('|^(.*) dollars$|', preg_replacement_quote($replacement) . ' dollars', $input));
//  output: string(14) "$10+$5 dollars"

Now we get the expected output. Proper data handling is a good thing.

Filed Under

  • PHP

Related Posts

  • PHP first impressions from a J2EE programmer
  • Why isn’t PHP the natural successor to Java?
  • The Problem with Markup Languages
  • goto in PHP
  • OOP is Mature, not Dead
You can leave a response, or trackback from your own site.

6 Responses to “Two preg_replace Escaping Gotchas”

  1. Christian says:
    11/14/2005 at 1:59 am

    I wrote a small paper on how the “e” modifier can be abused by attackers a couple of weeks ago ( http://hauser-wenz.de/playground/papers/RegExInjection.pdf ). I was prompted to do that because I was discussing in some security talks this year whether upcoming attacks like “XPath Injection” are ridiculous or a real threat. I rather thought of the former, but then I found the “e” modifier in a real-world application I audited earlier this year, *ouch*.
    Nice examples, btw!

  2. Roan says:
    11/17/2005 at 2:04 am

    Sorry for unrelated comment, but thanks to your site I found out that someone samelessly stole article from my site. Not you :) You have “recent bookmarks” sidebar with link “Why is PHP a Pain? installing php applications” and linked article http://www.designbytim.com/blog/2005/10/27/22/ is actually stolen from my blog http://blog.enargi.com/programming/php/why-is-it-so-hard/ . Unbeliveable. I never saw someone actually stealing articles and especially on the subject of professional PHP.

  3. AllThingsDev.com » preg_replace gotchas says:
    11/19/2005 at 6:44 pm

    [...] I’ve been reading Professional PHP for a few weeks now and I’m really enjoying it. It is one of the few blogs out there that actually writes about code and coding in general. For example, their latest post gives a quick overview of some gotchas with the php function preg_replace. Professional PHP gives some good pointers about how to remove those unwanted extra slashes and things of that nature. [...]

  4. SitePoint Blogs » The Joy of Regular Expressions [3] says:
    9/27/2006 at 11:42 am

    [...] Another read, specific to escaping regular expressions and the types of security holes you might fall into with preg_replace(), is Jeff’s explanation of two preg_replace() escaping gotcha’s, which describes the exact nature of the problem plus provides a solution to escaping replacement strings. [...]

  5. Benjamin A. Shelton | Blog » Blog Archive » Symfony 1.3/1.4 and Suhosin says:
    3/2/2010 at 10:03 pm

    [...] a good source on preg_replace, why you should always use single quotes, common mistakes, and why you should really just avoid [...]

  6. Readlf says:
    5/4/2010 at 7:18 am

    Can u please make function preg_replacement_quote for double quoted strings? (Mail is not fake!)

Leave a Reply

Click here to cancel reply.

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

code: use [code=php][/code].

Comment Preview

    Subscribe Feed
    Share Subscribe to this blog…
    Share Bookmark or share this page…
  • About

    My name is Jeff Moore. I'm a PHP programmer living in San Francico and working for a startup.

    More about me…

  • Categories (Home)

    • Agile Methods (14)
    • Mac (14)
    • Misc (17)
    • Open Source (14)
    • PHP (98)
    • Software Design (29)
    • Usability (14)
    • Web Design (20)
  • Recent Comments

    • Programming Language Trends via Google  19
      Craigslist pva, jessica, Scott [...]
    • Looking Towards the Cloud  35
      bentonville multiple listing, cosmetic dental, Sam Brodish [...]
    • PHP versus ASP  8
      Marhta Blight, Ravi, Ryan Brooks [...]
    • How to Transfer Mac OS X Application Data between Computers  59
      Website Migration, harry the computer support guy, Dotty Salvage [...]
    • Working with PHP 5 in Mac OS X 10.5 (Leopard)  157
      lehuuphuc, Robert Parthemer, Lingerie Intimate [...]
    • PHP Games  25
      jessica, Tennille Cranor at Chilli Plants, Lucas Ortell [...]
    • un-PEAR-ing  5
      jessica, Eugene Panin, Arnaud [...]
    • The Legality of Republishing RSS Feeds  23
      kevinxiao, Marissa Miscovich, Quick Student Loans [...]
    • Faster Page Loading  4
      jessica, angular cheilitis, Aaron Rosenfeld [...]
    • PDO versus MDB2  15
      jessica, kevinxiao, Gavin [...]
  • Recent Posts

    • ZendCon: Writing Maintainable PHP Code
    • Looking Towards the Cloud
    • Holiday Tech Support
    • Closures are coming to PHP
    • php | tek Wrapup
    • php | tek 2008
    • Sarah Snow Stever
    • Benchmarking PHP’s Magic Methods
    • The Endpoints of the Scale of Stupidity on Video
    • Working with PHP 5 in Mac OS X 10.5 (Leopard)
  • Site

    • Archives
    • Log in
  • Search