Professional PHP

PHP Programming, Web Development, PHP Advocacy and PHP Best Practices.
« Communicating a Vision with Open Source
A Glimpse into the Future: PHP 6 »

Two preg_replace Escaping Gotchas

November 13th, 2005

preg_replace is a major workhorse function in PHP. Unfortunately, there are some less than obvious issues with using it properly. Here are two:

The e modifier causes the replacement value of preg_replace (including backreferences) to be evaluated as PHP code. This is a powerful capability. If you’ve ever seen an SQL injection, this sounds dangerous. It would be, too, but PHP automatically escapes any backreferences before building the string to evaluate. So this is safe:

$input = ‘” . die() . “‘;
var_dump(preg_replace(‘|^(.*)$|e’, ‘”\1″‘, $input));
// output: string(13) “” . die() . “”

However, if you use double quotes inside your replacement string, as in this example, then variable parsing is still active. This can lead to problems with syntax errors or with PHP variable values inserted into the string:

$password = ’secret’;
$input = ‘$password’;
var_dump(preg_replace(‘|^(.*)$|e’, ‘”\1″‘, $input));
// output: string(6) “secret”

So obviously, we want to be careful to use single quotes to avoid variable parsing. However, we aren’t done with single quotes. preg_replace doesn’t know which quote style you use, so it escapes both of them. That means that if your input actually does contain a quote, your quote will gain an unwanted slash. How many times have you seen bad php code do that?

$input = ‘”‘;
var_dump(preg_replace(‘|^(.*)$|e’, “‘\\1′”, $input));
// output: string(2) “\”"

A naive solution might run the value through stripslashes to fix that, but if your input actually has a slash in it, it will be unexpectedly removed:

$input = ‘\\’;
var_dump(preg_replace(‘|^(.*)$|e’, “stripslashes(’\\1′)”, $input));
// output: string(0) “”

So what is the best solution? Well, in my book, it is to use preg_replace_callback and avoid preg_replace on e altogether. This has the dual advantage of avoiding all the escaping issues and also not triggering an eval on every call if you happen to be in a loop.

Second, most users of the preg_ functions are familiar with preg_quote for escaping strings to use them as literals in regular expression patterns. However, many people don’t realize that the replacement parameter of preg_replace also has special characters:

$input = ‘$5 dollars’;
$replacement = ‘$10′;
var_dump(preg_replace(‘|^(.*) dollars$|’, $replacement . ‘ dollars’, $input));
// output: string(8) ” dollars

Where did the $10 go? Well, it got turned into backreference 10, which was empty. A naive solution would be to use preg_quote:

$input = ‘$5 dollars’;
$replacement = ‘$10+$5′;
var_dump(preg_replace(‘|^(.*) dollars$|’, preg_quote($replacement) . ‘ dollars’, $input));
// output: string(15) “$10\+$5 dollars”

But now we’ve got that spare slash that tells the world that this code ain’t quite right. The reason for this is that the characters that are special in the replacement value of preg_replace are not the same characters that are special in the pattern. So here is a solution:

function preg_replacement_quote($str) {
    return
preg_replace(‘/(\$|\\\\)(?=\d)/’, ‘\\\\\1′, $str);
}

$input = ‘$5 dollars’;
$replacement = ‘$10+$5′;
var_dump(preg_replace(‘|^(.*) dollars$|’, preg_replacement_quote($replacement) . ‘ dollars’, $input));
//  output: string(14) “$10+$5 dollars”

Now we get the expected output. Proper data handling is a good thing.

categories PHP
tags preg replace, regular expressions, security

Related Posts

  • PHP first impressions from a J2EE programmer
  • Why isn’t PHP the natural successor to Java?
  • The Problem with Markup Languages
  • goto in PHP
  • OOP is Mature, not Dead
You can leave a response, or trackback from your own site.

5 Responses to “Two preg_replace Escaping Gotchas”

  1. #1 Christian responds...
    November 14th, 2005 at 1:59 am

    I wrote a small paper on how the “e” modifier can be abused by attackers a couple of weeks ago ( http://hauser-wenz.de/playground/papers/RegExInjection.pdf ). I was prompted to do that because I was discussing in some security talks this year whether upcoming attacks like “XPath Injection” are ridiculous or a real threat. I rather thought of the former, but then I found the “e” modifier in a real-world application I audited earlier this year, *ouch*.
    Nice examples, btw!

  2. #2 Roan responds...
    November 17th, 2005 at 2:04 am

    Sorry for unrelated comment, but thanks to your site I found out that someone samelessly stole article from my site. Not you :) You have “recent bookmarks” sidebar with link “Why is PHP a Pain? installing php applications” and linked article http://www.designbytim.com/blog/2005/10/27/22/ is actually stolen from my blog http://blog.enargi.com/programming/php/why-is-it-so-hard/ . Unbeliveable. I never saw someone actually stealing articles and especially on the subject of professional PHP.

  3. AllThingsDev.com » preg_replace gotchas pingbacked on November 19th, 2005 at 6:44 pm
  4. SitePoint Blogs » The Joy of Regular Expressions [3] pingbacked on September 27th, 2006 at 11:42 am
  5. #5 Anonymous responds...
    February 11th, 2008 at 2:22 pm

    chmod(IMAGES.”avatars/”.$avatarname,0644);

Leave a Reply

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <em> <i> <strike> <strong>

code: use [code=php][/code].

Comment Preview

  • Search

  • Subscribe

    Subscribe All Posts
    Subscribe All Comments
    Subscribe All Bookmarks
    Subscribe with Bloglines Subscribe with My Yahoo Add to netvibes Subscribe in NewsGator Online Subscribe with Google feed reader
  • Share This

  • Categories (Home)

    • Agile Methods (14)
    • Mac (14)
    • Misc (16)
    • Open Source (14)
    • PHP (93)
    • Software Design (27)
    • Usability (14)
    • WACT (7)
    • Web Design (20)
  • Recent Comments

    • PHP Coding Standards  6
      3123, Jack Johnson, Ignatius [...]
    • Expert and Novice Programmers  13
      Ben W, flj, chris web developer [...]
    • Working with PHP 5 in Mac OS X 10.5 (Leopard)  83
      e-okul, Luis Oscar Cruz, xentek [...]
    • How to Transfer Mac OS X Application Data between Computers  34
      Khaled bin Alwaleed, Oscar, TeeJay [...]
    • Sarah Snow Stever  23
      Snowcore, ennah, Philippine Website Developers [...]
    • PHP Development From Java Architects Eye  9
      Bobrila, FelhoBacsi, Angsuman Chakraborty [...]
    • Improved Error Messages in PHP 5  9
      ennah, Khumaer, retry [...]
    • The value of MVC  7
      Vulchak, อะไหล่แอร์, Derek Scruggs [...]
    • Why PHP is easier to learn than Java  13
      , , WTF [...]
    • goto in PHP  38
      Goldilocks, , SFM [...]
    • Keywords and Language Simplicity  6
      Handy, minikperi, PHP Encoder [...]
  • Pages

    • Tags
  • Recent Posts

    • Sarah Snow Stever
    • Benchmarking PHP’s Magic Methods
    • The Endpoints of the Scale of Stupidity on Video
    • Working with PHP 5 in Mac OS X 10.5 (Leopard)
    • Keywords and Language Simplicity
    • Improved Error Messages in PHP 5
    • Michigan Taxes Graphic Design Services
    • Ruby versus PHP or There and Back Again
    • Mighty Mouse Kryptonite and Exceeding Expectations
    • reCAPTCHA - Combining Distributed Problem Solving with a Web Service
  • Archives

    • 2007: Jan Feb Mar Apr May Sep Oct Nov
    • 2006: Jan Feb Mar Apr May Jun Jul Oct Nov Dec
    • 2005: Jan Feb Mar Apr May Sep Oct Nov Dec
    • 2004: Apr May Jun Jul Aug Sep Oct Nov
  • Menu

    • Register
    • Login