<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Professional PHP &#187; preg_replace</title>
	<atom:link href="http://www.procata.com/blog/archives/tag/preg_replace/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.procata.com/blog</link>
	<description>PHP Programming, Web Development, PHP Advocacy and PHP Best Practices.</description>
	<lastBuildDate>Fri, 10 Dec 2010 17:23:30 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.6</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Two preg_replace Escaping Gotchas</title>
		<link>http://www.procata.com/blog/archives/2005/11/13/two-preg_replace-escaping-gotchas/</link>
		<comments>http://www.procata.com/blog/archives/2005/11/13/two-preg_replace-escaping-gotchas/#comments</comments>
		<pubDate>Mon, 14 Nov 2005 05:51:11 +0000</pubDate>
		<dc:creator>Jeff</dc:creator>
				<category><![CDATA[PHP]]></category>
		<category><![CDATA[preg_replace]]></category>
		<category><![CDATA[regular-expressions]]></category>
		<category><![CDATA[security]]></category>

		<guid isPermaLink="false">http://www.procata.com/blog/?p=155</guid>
		<description><![CDATA[preg_replace is a workhorse PHP function, but it has a couple of escaping gotchas that can cause it to yield unexpected or undesirable results.]]></description>
			<content:encoded><![CDATA[<p><a href="http://us2.php.net/manual/en/function.preg-replace.php">preg_replace</a> is a major workhorse function in PHP.  Unfortunately, there are some less than obvious issues with using it properly.  Here are two:</p>
<p>The e modifier causes the replacement value of preg_replace (including backreferences) to be evaluated as PHP code.  This is a powerful capability.  If you&#8217;ve ever seen an SQL injection, this sounds dangerous.  It would be, too, but PHP automatically escapes any backreferences before building the string to evaluate.  So this is safe:</p>
<p><code><span style="color: #000000"></span><span style="color: #0000BB">$input </span><span style="color: #007700">= </span><span style="color: #DD0000">'" . die() . "'</span><span style="color: #007700">;<br /></span><span style="color: #0000BB">var_dump</span><span style="color: #007700">(</span><span style="color: #0000BB">preg_replace</span><span style="color: #007700">(</span><span style="color: #DD0000">'|^(.*)$|e'</span><span style="color: #007700">, </span><span style="color: #DD0000">'"\1"'</span><span style="color: #007700">, </span><span style="color: #0000BB">$input</span><span style="color: #007700">));<br /></span><span style="color: #FF8000">// output: string(13) "" . die() . ""<br /></span></code><br />
However, if you use double quotes inside your replacement string, as in this example, then variable parsing is still active.  This can lead to problems with syntax errors or with PHP variable values inserted into the string:</p>
<p><code><span style="color: #000000"></span><span style="color: #0000BB">$password </span><span style="color: #007700">= </span><span style="color: #DD0000">'secret'</span><span style="color: #007700">;<br /></span><span style="color: #0000BB">$input </span><span style="color: #007700">= </span><span style="color: #DD0000">'$password'</span><span style="color: #007700">;<br /></span><span style="color: #0000BB">var_dump</span><span style="color: #007700">(</span><span style="color: #0000BB">preg_replace</span><span style="color: #007700">(</span><span style="color: #DD0000">'|^(.*)$|e'</span><span style="color: #007700">, </span><span style="color: #DD0000">'"\1"'</span><span style="color: #007700">, </span><span style="color: #0000BB">$input</span><span style="color: #007700">));<br /></span><span style="color: #FF8000">// output: string(6) "secret"<br /></span></code><br />
So obviously, we want to be careful to use single quotes to avoid variable parsing.  However, we aren&#8217;t done with single quotes.  preg_replace doesn&#8217;t know which quote style you use, so it escapes both of them.  That means that if your input actually does contain a quote, your quote will gain an unwanted slash.  How many times have you seen bad php code do that?</p>
<p><code><span style="color: #000000"></span><span style="color: #0000BB">$input </span><span style="color: #007700">= </span><span style="color: #DD0000">'"'</span><span style="color: #007700">;<br /></span><span style="color: #0000BB">var_dump</span><span style="color: #007700">(</span><span style="color: #0000BB">preg_replace</span><span style="color: #007700">(</span><span style="color: #DD0000">'|^(.*)$|e'</span><span style="color: #007700">, </span><span style="color: #DD0000">"'\\1'"</span><span style="color: #007700">, </span><span style="color: #0000BB">$input</span><span style="color: #007700">));<br /></span><span style="color: #FF8000">// output: string(2) "\""<br /></span></code><br />
A naive solution might run the value through stripslashes to fix that, but if your input actually has a slash in it, it will be unexpectedly removed:</p>
<p><code><span style="color: #000000"></span><span style="color: #0000BB">$input </span><span style="color: #007700">= </span><span style="color: #DD0000">'\\'</span><span style="color: #007700">;<br /></span><span style="color: #0000BB">var_dump</span><span style="color: #007700">(</span><span style="color: #0000BB">preg_replace</span><span style="color: #007700">(</span><span style="color: #DD0000">'|^(.*)$|e'</span><span style="color: #007700">, </span><span style="color: #DD0000">"stripslashes('\\1')"</span><span style="color: #007700">, </span><span style="color: #0000BB">$input</span><span style="color: #007700">));<br /></span><span style="color: #FF8000">// output: string(0) ""<br /></span></code><br />
So what is the best solution?  Well, in my book, it is to use <a href="http://us2.php.net/manual/en/function.preg-replace-callback.php">preg_replace_callback</a> and avoid preg_replace on e altogether.  This has the dual advantage of avoiding all the escaping issues and also not triggering an eval on every call if you happen to be in a loop.</p>
<p>Second, most users of the preg_ functions are familiar with preg_quote for escaping strings to use them as literals in regular expression patterns.  However, many people don&#8217;t realize that the replacement parameter of preg_replace also has special characters:</p>
<p><code><span style="color: #000000"></span><span style="color: #0000BB">$input </span><span style="color: #007700">= </span><span style="color: #DD0000">'$5 dollars'</span><span style="color: #007700">;<br /></span><span style="color: #0000BB">$replacement </span><span style="color: #007700">= </span><span style="color: #DD0000">'$10'</span><span style="color: #007700">;<br /></span><span style="color: #0000BB">var_dump</span><span style="color: #007700">(</span><span style="color: #0000BB">preg_replace</span><span style="color: #007700">(</span><span style="color: #DD0000">'|^(.*) dollars$|'</span><span style="color: #007700">, </span><span style="color: #0000BB">$replacement </span><span style="color: #007700">. </span><span style="color: #DD0000">' dollars'</span><span style="color: #007700">, </span><span style="color: #0000BB">$input</span><span style="color: #007700">));<br /></span><span style="color: #FF8000">// output: string(8) " dollars<br /></span></code><br />
Where did the $10 go?  Well, it got turned into backreference 10, which was empty.  A naive solution would be to use preg_quote:</p>
<p><code><span style="color: #000000"></span><span style="color: #0000BB">$input </span><span style="color: #007700">= </span><span style="color: #DD0000">'$5 dollars'</span><span style="color: #007700">;<br /></span><span style="color: #0000BB">$replacement </span><span style="color: #007700">= </span><span style="color: #DD0000">'$10+$5'</span><span style="color: #007700">;<br /></span><span style="color: #0000BB">var_dump</span><span style="color: #007700">(</span><span style="color: #0000BB">preg_replace</span><span style="color: #007700">(</span><span style="color: #DD0000">'|^(.*) dollars$|'</span><span style="color: #007700">, </span><span style="color: #0000BB">preg_quote</span><span style="color: #007700">(</span><span style="color: #0000BB">$replacement</span><span style="color: #007700">) . </span><span style="color: #DD0000">' dollars'</span><span style="color: #007700">, </span><span style="color: #0000BB">$input</span><span style="color: #007700">));<br /></span><span style="color: #FF8000">// output: string(15) "$10\+$5 dollars"<br /></span></code><br />
But now we&#8217;ve got that spare slash that tells the world that this code ain&#8217;t quite right.  The reason for this is that the characters that are special in the  replacement value of preg_replace are not the same characters that are special in the pattern.  So here is a solution:</p>
<p><code><span style="color: #000000"></span><span style="color: #007700">function </span><span style="color: #0000BB">preg_replacement_quote</span><span style="color: #007700">(</span><span style="color: #0000BB">$str</span><span style="color: #007700">) {<br />&nbsp;&nbsp;&nbsp;&nbsp;return </span><span style="color: #0000BB">preg_replace</span><span style="color: #007700">(</span><span style="color: #DD0000">'/(\$|\\\\)(?=\d)/'</span><span style="color: #007700">, </span><span style="color: #DD0000">'\\\\\1'</span><span style="color: #007700">, </span><span style="color: #0000BB">$str</span><span style="color: #007700">);<br />}</p>
<p></span><span style="color: #0000BB">$input </span><span style="color: #007700">= </span><span style="color: #DD0000">'$5 dollars'</span><span style="color: #007700">;<br /></span><span style="color: #0000BB">$replacement </span><span style="color: #007700">= </span><span style="color: #DD0000">'$10+$5'</span><span style="color: #007700">;<br /></span><span style="color: #0000BB">var_dump</span><span style="color: #007700">(</span><span style="color: #0000BB">preg_replace</span><span style="color: #007700">(</span><span style="color: #DD0000">'|^(.*) dollars$|'</span><span style="color: #007700">, </span><span style="color: #0000BB">preg_replacement_quote</span><span style="color: #007700">(</span><span style="color: #0000BB">$replacement</span><span style="color: #007700">) . </span><span style="color: #DD0000">' dollars'</span><span style="color: #007700">, </span><span style="color: #0000BB">$input</span><span style="color: #007700">));<br /></span><span style="color: #FF8000">//&nbsp;&nbsp;output: string(14) "$10+$5 dollars"<br /></span></code><br />
Now we get the expected output.  Proper data handling is a good thing.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.procata.com/blog/archives/2005/11/13/two-preg_replace-escaping-gotchas/feed/</wfw:commentRss>
		<slash:comments>16</slash:comments>
		</item>
	</channel>
</rss>

