Working with cross-platform newline characters & user form input
Posted September 26, 2011 by Nick Vogt in Programming
If your site accepts any form of user input, it's a good idea to understand the differences in newline characters between platforms and how to handle them (and it's just good to know in general). Say your site is running on a Unix host and receives comments or posts from Windows clients. Any HTML textarea form input sent from the Windows clients will contain newline characters that do not match those native to the Unix host. Likewise, if your site runs on a Windows host, the occasional Linux or Mac user will be sending non-native newline characters as well.

Here are the newline characters that each major system uses:

PlatformLine Break
Windows\r\n (Carriage Return & Line Feed)
Unix/Linux/MacOS X\n (Line Feed)
MacOS 9 and earlier\r (Carriage Return)

(See this Wiki and this Stack Overflow post for a run-down on what carriage returns and line feeds are)

A typical scenario is that you'll want to replace all newline characters with HTML break tags, or strip them out completely. You can't rely on PHP's constant PHP_EOL, as it only takes the form of the line break of the server its running on. You have to search for each type of line break individually (or use a regular expression, but that may be slower).

What I do is manually replace all newline characters with the Unix \n on input. So if you submit a comment to this site, all occurrences of \r\n and \r are replaced with \n before it is stored in the database. This keeps things consistent across the site. Just make sure on your site that you search for the newline characters in that order, as searching for \r or \n first will match just half of the Windows \r\n, leaving you with potentially double line breaks.

Here is an example:

$str = str_replace("\r\n", "\n", $str);
$str = str_replace("\r", "\n", $str);

Make sure you use double quotes, as PHP will not interpret escaped characters with single quotes.

You can also do it this way:

$arr = array("\r\n", "\r")
$str = str_replace($arr, "\n", $str);

When using an array with str_replace, it will perform the replacements in the order that they appear in the array.

Since we're talking about line breaks with user input, you'll probably need a way to prevent users from submitting more than two line breaks in a row. Here is a simple preg_replace you can use after you've converted all line breaks to \n:

$str = preg_replace("!\n{3,}!", "\n\n", $str);

This matches any sequence of three or more newline characters in a row, and replaces them with two newline characters. It won't prevent a user from making lots of newline characters in between lots of short sections of text though. For that, you may want to have a script that limits the total number of newline characters a user can input.

Comment on this post


Features
Free Web MP3 PlayerComputer Build GuidePHP Beginner Tutorials
Post Series
ActionScript 3 TutorialsHard Drive Cost Charts
Popular Tags
actionscriptajaxcall of dutycrysisebayfacebookgooglejavascriptminecraftneweggphprageskyrimtutorialyoutube


H3XED © 2012 Nick Vogt | Web Design
Saturday, May 19, 2012 | Privacy Policy | Disclosure Policy | Contact