<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Professional PHP &#187; input-filtering</title>
	<atom:link href="http://www.procata.com/blog/archives/tag/input-filtering/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.procata.com/blog</link>
	<description>PHP Programming, Web Development, PHP Advocacy and PHP Best Practices.</description>
	<lastBuildDate>Fri, 10 Dec 2010 17:23:30 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.6</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>The Problem with Markup Languages</title>
		<link>http://www.procata.com/blog/archives/2007/03/14/the-problem-with-markup-languages/</link>
		<comments>http://www.procata.com/blog/archives/2007/03/14/the-problem-with-markup-languages/#comments</comments>
		<pubDate>Wed, 14 Mar 2007 17:30:14 +0000</pubDate>
		<dc:creator>Jeff</dc:creator>
				<category><![CDATA[PHP]]></category>
		<category><![CDATA[Software Design]]></category>
		<category><![CDATA[Usability]]></category>
		<category><![CDATA[html-markup]]></category>
		<category><![CDATA[input-filtering]]></category>
		<category><![CDATA[markup-languages]]></category>
		<category><![CDATA[regular-expressions]]></category>
		<category><![CDATA[security]]></category>
		<category><![CDATA[wiki-syntax]]></category>
		<category><![CDATA[wordpress]]></category>

		<guid isPermaLink="false">http://www.procata.com/blog/archives/2007/03/14/the-problem-with-markup-languages/</guid>
		<description><![CDATA[Chris Shiflett has a post today, Allowing HTML and Preventing XSS.  The problem is how to allow users to format their contributed content without introducing security vulnerabilities.  The answer is usually some sort of markup language or filtering and sanitization of HTML.
BBCODE was designed for this purpose.  There is no actual standard, [...]]]></description>
			<content:encoded><![CDATA[<p>Chris Shiflett has a post today, <a href="http://shiflett.org/blog/2007/mar/allowing-html-and-preventing-xss">Allowing HTML and Preventing XSS</a>.  The problem is how to allow users to format their contributed content without introducing security vulnerabilities.  The answer is usually some sort of markup language or filtering and sanitization of HTML.</p>
<p>BBCODE was designed for this purpose.  There is no actual standard, but the core syntax seems fairly uniform.  It&#8217;s good for those used to forums, where it seems to norm.</p>
<p>HTML markup is nice because it is a standard, even if varying subsets are supported.  Learning a little HTML isn&#8217;t going to hurt anyone, at least for the next 20 years or so.  The problem is that HTML was never intended to be hand edited.  The syntax is not the most inviting, and different HTML-like markup languages handle whitespace differently than the HTML standard.</p>
<p>Wiki markup syntaxes were designed to be human friendly. The main problem I have with wiki syntax is that there is no standard.  It seems like every wiki has a different way to formulate a link, for example.  I guess there is some progress with <a href="http://www.wikicreole.org/">Wiki Creole</a>, but I still have a bad taste in my mouth.</p>
<p>The other problem I have with wiki markup is that I find it to be non-deterministic.  When I edit any given wiki and try to use more than basic formatting, I never know what I am going to get.  Most of the markup processing engines for these wikis are impenetrable morasses of regular expressions.  It can be hard to gauge interactions.  Are you really sure they are secure?</p>
<p>Speaking of impenetrable morasses of regular expressions, have you ever looked at WordPress&#8217;s input path?  I&#8217;m sure every one with a WordPress blog who likes to blog about PHP code knows that it is a code eater.  I&#8217;ve been particularly disappointed with WordPress in this area.  Most the &#8220;code formatting&#8221; plugins still have problems protecting code from WordPress&#8217; heavy hand.</p>
<p>But the WordPress preg_replace gauntlet doesn&#8217;t just mangle code.  I have a post which has been sitting in draft mode for several weeks because I can&#8217;t figure out how to give it the proper markup.  WordPress is somehow taking my perfectly balanced input markup and producing &#8220;unbalanced&#8221; output markup.  I haven&#8217;t yet tracked down the problem to either submit a fix or to do a good bug report.  Frankly, I&#8217;m not looking forward to trudging through all those regular expressions.</p>
<p>In Chris&#8217; post, he takes the regular expression approach.  Folks in the comments have pointed out a few problems with his approach, including the problem of interleaved tags.  If you can&#8217;t tell by now, I am not a fan of the regular expression gauntlet approach to markup languages.  I prefer a defined syntax and a traditional computer science style parser (which may use regular expressions).</p>
<p>The other must-have is a preview option.  With so much variation in markup languages, not having a preview leaves the user to play Russian roulette with their submitted content.  I&#8217;ve talked about that before in the <a href="http://www.procata.com/blog/archives/2005/03/31/the-usability-of-input-filtering/">usability of input filtering</a>.  This is another area where WordPress leaves the user high and dry.  </p>
<p>The complex input path in WordPress combined with its reliance on global variables seems to leave it unable to do an in-page preview.  The admin area preview is an IFRAME so that it launches a separate request.  The various live preview plugins are JavaScript based and don&#8217;t work when it is disabled.  They also don&#8217;t pass the input through the same input path that WordPress uses, so they are not a true preview.</p>
<p>I don&#8217;t mean for this to be a WordPress rant, on the whole, I like WordPress.  Rather, I just wanted to point out how hard it can be to do good input filtering, that is safe, reliable, deterministic, and usable.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.procata.com/blog/archives/2007/03/14/the-problem-with-markup-languages/feed/</wfw:commentRss>
		<slash:comments>14</slash:comments>
		</item>
		<item>
		<title>The Usability of Input Filtering</title>
		<link>http://www.procata.com/blog/archives/2005/03/31/the-usability-of-input-filtering/</link>
		<comments>http://www.procata.com/blog/archives/2005/03/31/the-usability-of-input-filtering/#comments</comments>
		<pubDate>Fri, 01 Apr 2005 06:21:05 +0000</pubDate>
		<dc:creator>Jeff</dc:creator>
				<category><![CDATA[PHP]]></category>
		<category><![CDATA[Usability]]></category>
		<category><![CDATA[Web Design]]></category>
		<category><![CDATA[input-filtering]]></category>
		<category><![CDATA[security]]></category>

		<guid isPermaLink="false">http://www.procata.com/blog/archives/2005/03/31/the-usability-of-input-filtering/</guid>
		<description><![CDATA[There seems to be much interest lately in input filtering in PHP, especially in cross site scripting prevention.  I&#8217;ve always preferred input validation to input filtering, but I am giving filtering a new examination.  My problem with filtering is with usability.  The comments to this post are a good example.  There [...]]]></description>
			<content:encoded><![CDATA[<p>There seems to be much interest lately in input filtering in PHP, especially in cross site scripting prevention.  I&#8217;ve always preferred input validation to input filtering, but I am giving filtering a new examination.  My problem with filtering is with usability.  The comments to <a href="http://www.procata.com/blog/archives/2005/03/08/microbenchmarks-of-single-and-double-qouting/#comments">this post</a> are a good example.  There are obviously some usability issues going on here.</p>
<p>I think the fundamental problem with input filtering and especially XSS filtering is that it violates <a href="http://en.wikipedia.org/wiki/Principle_of_least_astonishment">the principle of least surprise</a>.   User input is silently modified without the user&#8217;s knowledge.  If the violation is innocent, then the software surprises the user.  This is bad.  At least with validation, the user gets a heads up on the problem.</p>
<p>Let me try to name and enumerate some scenarios:</p>
<p><strong>Direct Filter</strong><br />
This is what WordPress did in the example post.  It simply accepted the user input and silently changed it.  The filtered value is stored directly into the database.  The original input is lost.  There is no preview.  I think this has to be a usability worse case scenario.</p>
<p><strong>Filter with Preview</strong><br />
This scenario adds a preview capability to the last.  The filter is still applied.  A validation failure or explicit preview button causes the form values to be re-displayed and a preview panel to be shown.  However, the previous input value is silently modified and sent back to the user.  The user may or may not realize that his original input has been changed during the round trip.</p>
<p>This is also seems like a usability problem, but every once and a while it happens to me when entering legitimate input into professionally written programs.</p>
<p><strong>Filter with Buffered Preview</strong><br />
This scenario adds an additional buffer to the last.  The filter is applied, but the original input is sent back to the user in the form field.  However, the preview panel shows the modified value.</p>
<p>I don&#8217;t really see this very often outside of fields with a dedicated markup language (for example BBCode).</p>
<p><strong>Filter with Forced Preview</strong><br />
The input value is silently filtered.  However, the user is forced to preview the output at least once.  Its up to the user to notice the results of the filter.</p>
<p>I think slashdot does this.</p>
<p><strong>Filter with Confirmation</strong><br />
A stricter variation of Forced Preview where as the last stage, the user must confirm their input once without the ability to change it.  It is up to the user to notice the results of the filter.</p>
<p>I think this is popular as the last stage of a wizard style interface.</p>
<p><strong>Filter with Confirmation and Warning</strong><br />
The filter is applied and the user&#8217;s input is changed, however, the user is warned exactly which value was changed by the filter.</p>
<p>I don&#8217;t think I&#8217;ve ever seen this one.</p>
<p><strong>Validation</strong><br />
The program notifies the user that the input value is bad, but does not modify it.  The user must change the value to proceed.</p>
<p>I tend to use this one.  I escape all output, so I don&#8217;t worry too much about displaying XSS in the preview panel.</p>
<p>Obviously, you can mix and match scenarios for different input rules and fields. I&#8217;m sure there are other scenarios.  Please suggest some.</p>
<p>I guess I&#8217;ve been programming for about 23 years now.  The longer I do it, the more reluctant I am to be strict with user input.  Ultra sanitized, ultra structured data may seem attractive to the programmer, but its a pain for the user and its only a matter of time before a legitimate exception comes along.  A European phone number, the 51rst state, a canadian postal code, a new millennium, etc.  The exception is the rule.  Understandably, XSS must be prevented, but its easy to go too far.</p>
<p>Which of these scenarios do you think are best from the user&#8217;s perspective?  From the programmers perspective?</p>
]]></content:encoded>
			<wfw:commentRss>http://www.procata.com/blog/archives/2005/03/31/the-usability-of-input-filtering/feed/</wfw:commentRss>
		<slash:comments>26</slash:comments>
		</item>
		<item>
		<title>Even the Big Guys Get Validation Wrong</title>
		<link>http://www.procata.com/blog/archives/2004/05/13/even-the-big-guys-get-validation-wrong/</link>
		<comments>http://www.procata.com/blog/archives/2004/05/13/even-the-big-guys-get-validation-wrong/#comments</comments>
		<pubDate>Thu, 13 May 2004 12:46:17 +0000</pubDate>
		<dc:creator>Jeff</dc:creator>
				<category><![CDATA[Usability]]></category>
		<category><![CDATA[Web Design]]></category>
		<category><![CDATA[input-filtering]]></category>
		<category><![CDATA[input-validation]]></category>

		<guid isPermaLink="false">http://www.procata.com/blog/archives/2004/05/13/even-the-big-guys-get-validation-wrong/</guid>
		<description><![CDATA[I ordered a computer for someone from Dell last night.  When I got to the end of the order, I mistyped a digit on the credit card number and the form was redisplayed with an &#8220;invalid credit card number&#8221; error.  I added spaces between the digits (as they appear on the card) to [...]]]></description>
			<content:encoded><![CDATA[<p>I ordered a computer for someone from Dell last night.  When I got to the end of the order, I mistyped a digit on the credit card number and the form was redisplayed with an &#8220;invalid credit card number&#8221; error.  I added spaces between the digits (as they appear on the card) to check the number.  Sure enough, one digit was wrong.  I re-submitted, but the &#8220;invalid card number&#8221; error remained.  I was sure the card was valid and that I typed in the correct number.  After a little experimenting, it turns out that the order form could not handle the spaces that I added.  The person I was ordering the computer for was looking over my shoulder and said that he would have never figured out to remove the spaces.  I wonder how many people enter their credit card numbers as XXXX XXXX XXXX XXXX, just as it appears on the card.</p>
<p>(P.S. Dell seems to be a master at &#8220;Do you want fries with that?&#8221;)</p>
]]></content:encoded>
			<wfw:commentRss>http://www.procata.com/blog/archives/2004/05/13/even-the-big-guys-get-validation-wrong/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
	</channel>
</rss>

