9 Rules for Solid PHP Security

Mina Ayoub
15 min readAug 16, 2010

--

Rule 1: Never trust external data or input

The first thing you must realize about web application security is that you should not trust external data. Outside data includes any data that is not directly entered by the programmer in the PHP code. Any data from any other source (such as GET variables, forms POST, databases, configuration files, session variables, or cookies) is untrusted until steps are taken to ensure security.

For example, the following data elements can be considered safe because they are set in PHP.

Safe and secure code

Here is the code snippet:

<?php
$myUsername = ‘tmyer’;
$arrayUsers =array(‘tmyer’, ‘tom’, ‘tommy’);
define(“GREETING”, ‘hello there’. $myUsername);
?>

However, the following data elements are flawed.

Unsafe, flawed code

Here is the code snippet:

<?php
$myUsername = $_POST[‘username’]; //tainted!
$arrayUsers =array( $myUsername, ‘tom’, ‘tommy’); //tainted!
define(“GREETING”, ‘hello there’. $myUsername);//tainted!
?>

Why is the first variable $myUsername flawed? Because it comes directly from the form POST. Users can enter any string in this input field, including malicious commands to clear files or run previously uploaded files. You might ask, “Can’t you use the client-side (Javascrīpt) form validation script that only accepts the letters AZ to avoid this danger?” Yes, this is always a good step, but as you will see later. Anyone can download any form to their own machine, modify it, and resubmit any content they need.

The solution is simple: you must run the cleanup code on $_POST[‘username’]. If you don’t do this, then at any other time using $myUsername (such as in an array or constant), you can contaminate these objects.

An easy way to clean up user input is to use regular expressions to handle it. In this example, you only want to accept letters. It may also be a good idea to limit the string to a specific number of characters, or to require all letters to be lowercase.

Making user input safe

Here is the code snippet:

<?php
$myUsername =cleanInput( $_POST[‘username’]);//clean!
$arrayUsers =array( $myUsername,’tom’, ‘tommy’); //clean!
define(“GREETING”, ‘hello there’. $myUsername);//clean!
function cleanInput( $input){
$clean = strtolower( $input); //string becomes lowercase
$clean = preg_replace(“/[^az]/”,””, $clean); //String matches for all English letters
$clean = substr( $clean,0,12); //The string takes the first 12 digits, and the rest is discarded.
return $clean;
}
?>

Rule 2: Disable PHP settings that make security difficult to implement

I already know that I can’t trust user input, and I should know that I shouldn’t trust the way PHP is configured on my machine. For example, be sure to disable register_globals. If register_globals is enabled, you might do something careless, such as replacing a GET or POST string of the same name with $variable . By disabling this setting, PHP forces you to reference the correct variable in the correct namespace. To use variables from the form POST, you should reference $_POST[‘variable’]. This will not misunderstand this particular variable as a cookie, session, or GET variable.

Rule 3: If you can’t understand it, you can’t protect it.

Some developers use strange syntax, or organize statements very tightly to form short but ambiguous code. This approach can be efficient, but if you don’t understand what the code is doing, you can’t decide how to protect it.

For example, which of the following two pieces of code do you like?

Making the code easy to protect

Here is the code snippet:

<?php
//obfuscated code
$input = (isset( $_POST['username']) ? $_POST['username']:'');
//unobfuscated code
$input = '';
if (isset( $_POST['username'])){
$input = $_POST['username'];
}else{
$input = '';
}
?>

In the second clearer code snippet, it’s easy to see that $input is flawed and needs to be cleaned up before it can be safely processed.

Rule 4: Prevent SQL injection attacks

This tutorial will use examples to illustrate how to protect online forms while taking the necessary actions in the PHP code that processes the form. Similarly, even if you use PHP regex to ensure that the GET variable is completely numeric, you can still take steps to ensure that the SQL query uses escaped user input.

Defense in depth is not just a good idea, it ensures that you won’t get into serious trouble.

In a SQL injection attack, a user adds information to a database query by manipulating a form or a GET query string. For example, suppose you have a simple login database. Each record in this database has a username field and a password field. Build a login form that allows users to log in.

Simple login form

Here is the code snippet:

<html>
<head>
<title>Login</title>
</head>
<body>
<form action="verify.php"method="post">
<p><labelfor='user'>Username</label>
<input type='text' name='user'id='user'/>
</p>
<p><labelfor='pw'>Password</label>
<input type='password' name='pw'id='pw'/>
</p>
<p><input type='submit'value='login'/></p>
</form>
</body>
</html>

This form accepts the username and password entered by the user and submits the user input to a file called verify.php. In this file, PHP processes the data from the login form as follows:

Unsafe PHP form processing code

Here is the code snippet:

<?php
$okay = 0;
$username = $_POST['user'];
$pw = $_POST['pw'];
$sql ="select count(*) as ctr from users where username='". $username."' and password='". $pw."' limit 1";
$result =mysql_query( $sql);
while ( $data = mysql_fetch_object( $result)){
if ( $data->ctr== 1){
//they're okay to enter the application!
$okay = 1;
}
}
if ( $okay){
$_SESSION['loginokay'] = true;
header("index.php");
}else{
header("login.php");
}
?>

This code looks fine, right? Such code is used by hundreds (or even thousands) of PHP/MySQL sites around the world. Where is it wrong? Ok, remember “can’t trust user input.” There is no escaping of any information from the user, so the application is vulnerable. Specifically, any type of SQL injection attack may occur.

For example, if the user enters foo as the username and enters ‘ or ‘1’=’1 as the password, then the following string is actually passed to PHP and the query is passed to MySQL:

Here is the code snippet:

<?php
$sql ="select count(*) as ctr from userswhere username='foo' and password='' or '1'='1' limit 1";
?>

This query always returns a count of 1, so PHP will allow access. By injecting some malicious SQL at the end of the password string, the hacker can dress up as a legitimate user.

The solution to this problem is to use PHP’s built-in mysql_real_escape_string() function as a wrapper for any user input. This function escapes the characters in the string, making it impossible to pass special characters such as apostrophes and let MySQL operate on special characters.

Secure PHP form processing code

Here is the code snippet:

<?php
$okay = 0;
$username = $_POST['user'];
$pw = $_POST['pw'];
$sql ="select count(*) as ctr from users whereusername='".mysql_real_escape_string( $username)."' and password='".mysql_real_escape_string( $pw)."'limit 1";
$result =mysql_query( $sql);
while ( $data = mysql_fetch_object( $result)){
if ( $data->ctr== 1){
//they're okay to enter the application!
$okay =1;
}
}
if ( $okay){
$_SESSION['loginokay'] = true;
header("index.php");
}else{
header("login.php");
}
?>

Using mysql_real_escape_string() as a wrapper for user input, you can avoid any malicious SQL injection in user input. If the user attempts to pass a malformed password via SQL injection, the following query is passed to the database:

Here is the code snippet:

select count(*) as ctr from users where username=’foo’ and password=’’ or ‘1’=’1' limit 1"

There is nothing in the database that matches this password. In just one simple step, it blocked a big hole in the web application. The lesson here is that user input for SQL queries should always be escaped.

However, there are still several security holes that need to be blocked. The next item is to manipulate the GET variable.

Rule 5: Prevent users from manipulating GET variables

In the previous section, users were prevented from logging in with a malformed password. If you are smart, you should apply the methods you have learned to ensure that all user input to the SQL statement is escaped.

However, the user is now safely logged in. A user with a valid password does not mean that he will follow the rules — he has many opportunities to cause damage. For example, an application might allow users to view special content. All links point to locations like template.php?pid=33 or template.php?pid=321. The part after the question mark in the URL is called the query string. Because the query string is placed directly in the URL, it is also called a GET query string.

In PHP, if register_globals is disabled, this string can be accessed with $_GET[‘pid’]. In the template.php page.

Sample template.php

Here is the code snippet:

<?php
$pid = $_GET['pid'];
//we create an object of a fictional classPage
$obj = new Page;
$content = $obj->fetchPage( $pid);
//and now we have a bunch of PHP that displays the page
?>

Is there anything wrong here? First, it implicitly believes that the GET variable pid from the browser is safe. What will happen to this? Most users are not so smart enough to construct semantic attacks. However, if they notice pid=33 in the browser’s URL location field, it may start to mess. If they type in another number, then it may be fine; but if you type something else, such as entering a SQL command or the name of a file (such as /etc/passwd), or making a different prank, such as entering up to 3,000 characters The value, then what happens?

In this case, remember the basic rules and don’t trust user input. The application developer knows that the personal identifier (PID) accepted by template.php should be a number, so you can use PHP’s is_numeric() function to ensure that non-numeric PIDs are not accepted, as follows:

Using is_numeric() to limit GET variables

Here is the code snippet:

<?php
$pid = $_GET['pid'];
if (is_numeric( $pid)){
//we create an object of a fictional class Page
$obj = new Page;
$content = $obj->fetchPage( $pid);
//and now we have a bunch of PHP that displays the page
}else{
//didn't pass the is_numeric() test, do something else!
}
?>

This method seems to work, but the following inputs can be easily checked by is_numeric() :

100 (valid)
100.1 (There should be no decimal places)
+0123.45e6 (Scientific notation — not good)
0xff33669f (hexadecimal — dangerous! dangerous!)

So what should a security-conscious PHP developer do? Years of experience have shown that the best practice is to use regular expressions to ensure that the entire GET variable consists of numbers, as follows:

Using regular expressions to limit GET variables

Here is the code snippet:

<?php
$pid = $_GET['pid'];
if (strlen( $pid)){
if (!ereg("^[0–9]+ $", $pid)){
//do something appropriate, like maybe logging them out or sending themback to home page
}
}else{
//empty $pid, sosend them back to the home page
}
//we create an object of a fictional classPage, which is now
//moderately protected from evil user input
$obj = new Page;
$content = $obj->fetchPage( $pid);
//and now we have a bunch of PHP thatdisplays the page
?>

All you need to do is use strlen() to check if the length of the variable is non-zero; if so, use an all-digital regular expression to ensure that the data element is valid. If the PID contains letters, slashes, periods, or anything similar to hexadecimal, then this routine captures it and masks the page from user activity. If you look at the behind-the-scenes situation of the Page class, you’ll see that the security-conscious PHP developer has escaped the user input $pid to protect the fetchPage() method as follows:

Escape the fetchPage() method

Here is the code snippet:

<?php
class Page{
function fetchPage( $pid){
$sql ="select pid,title,desc,kw,content,status from page wherepid='".mysql_real_escape_string( $pid)."'";
}
}
?>

You might ask, “Since you have ensured that the PID is a number, why should you escape it?” Because you don’t know how many different contexts and situations will use the fetchPage() method. Protection must be done everywhere in the call to this method, and the escaping in the method reflects the meaning of defense in depth.

What happens if a user tries to enter a very long value, such as up to 1000 characters, trying to initiate a buffer overflow attack? The next section discusses this issue in more detail, but you can now add another check to make sure the input PID has the correct length. You know that the maximum length of the database’s pid field is 5 bits, so you can add the following checks.

Variable type matching, content matching, length matching

Using regular expressions and length checking to limit GET variables

Here is the code snippet:

<?php
$pid = $_GET['pid'];
if (strlen( $pid)){
if (!ereg("^[0–9]+ $", $pid)&& strlen( $pid)> 5){
//do something appropriate, like maybe logging them out or sending themback to home page
}
} else {
//empty $pid, sosend them back to the home page
}
//we create an object of a fictional class Page, which is now
//even more protected from evil user input
$obj = new Page;
$content = $obj->fetchPage( $pid);
//and now we have a bunch of PHP that displays the page
?>

Now, no one can stuff a 5,000-bit value into a database application — at least not where it involves GET strings. Imagine a hacker’s gnashing when he tries to break through your application! And because the error report is turned off, it is more difficult for hackers to conduct reconnaissance.

Rule 6: Prevent Buffer Overflow Attack

A buffer overflow attack attempts to overflow a memory allocation buffer in a PHP application (or more precisely, in Apache or the underlying operating system). Keep in mind that you might be writing a web application in a high-level language like PHP, but you will eventually call C (in the case of Apache). Like most low-level languages, C has strict rules for memory allocation.

A buffer overflow attack sends a large amount of data to the buffer, causing some of the data to overflow into the adjacent memory buffer, thereby destroying the buffer or rewriting the logic. This can cause denial of service, corrupt data, or execute malicious code on a remote server.

The only way to prevent a buffer overflow attack is to check the length of all user input. For example, if there is a form element that asks for the user’s name, add a maxlength attribute of 40 on this field and check it with substr() on the back end.

Checking the length of the user input

Here is the code snippet:

<?php
if ( $_POST['submit'] == "go"){
$name = substr( $_POST['name'],0,40);
}
?>
<form action="<?php echo $_SERVER['PHP_SELF'];?>"method="post">
<p><labelfor="name">Name</label>
<input type="text"name="name" id="name" size="20"maxlength="40"/></p>
<p><input type="submit"name="submit" value="go"/></p>
</form>

Why do you provide both the maxlength attribute and the substr() check on the back end? Because defense in depth is always good. The browser prevents users from typing long strings that PHP or MySQL can’t handle safely (imagine someone trying to type a name up to 1,000 characters), while back-end PHP checking ensures that no one manipulates form data remotely or in the browser. .

As you can see, this approach is similar to using strlen() in the previous section to check the length of the GET variable pid. In this example, any input value longer than 5 digits is ignored, but it is also easy to truncate the value to the appropriate length as follows:

Changing the length of the input GET variable

Here is the code snippet:

<?php
$pid = $_GET['pid'];
if (strlen( $pid)){
if (!ereg("^[0–9]+ $", $pid)){
//if non numeric $pid, sendthem back to home page
}
}else{
//empty $pid, sosend them back to the home page
}
//we have a numeric pid, but it may be too long, so let's check
if (strlen( $pid)>5){
$pid =substr( $pid,0,5);
}
//we create an object of a fictional class Page, which is now
//even more protected from evil user input
$obj = new Page;
$content = $obj->fetchPage( $pid);
//and now we have a bunch of PHP that displays the page
?>

Note that buffer overflow attacks are not limited to long strings of numbers or strings of letters. You may also see long hex strings (often looking like xA3 or xFF). Keep in mind that the purpose of any buffer overflow attack is to flood a particular buffer and put malicious code or instructions into the next buffer, destroying the data or executing malicious code. The easiest way to deal with hex buffer overflows is to not allow input beyond a certain length.

If you are working with a form text area that allows you to enter longer entries in the database, you cannot easily limit the length of the data on the client side. After the data reaches PHP, you can use a regular expression to clear any string like hexadecimal.

Preventing hex strings

Here is the code snippet:

<?php
if ( $_POST['submit'] == "go"){
$name = substr( $_POST['name'],0,40);
//clean out any potential hexadecimal characters
$name = cleanHex( $name);
//continue processing….
}
function cleanHex( $input){
$clean = preg_replace("![][xX]([A-Fa-f0–9]{1,3})!","", $input);
return $clean;
}
?>
<form action="<?php echo $_SERVER['PHP_SELF'];?>"method="post">
<p><labelfor="name">Name</label>
<input type="text"name="name" id="name" size="20"maxlength="40"/></p>
<p><input type="submit"name="submit" value="go"/></p>
</form>

You may find this series of operations a bit too strict. After all, hex strings have legitimate uses, such as outputting characters from a foreign language. How to deploy a hexadecimal regex is up to you. A better strategy is to delete the hex string only if there are too many hex strings in one line, or if the character of the string exceeds a certain number (such as 128 or 255).

Rule 7: Prevent Cross-site Scripting Attack

In cross-site scripting (XSS) attacks, there is often a malicious user entering information in a form (or through other user input methods) that inserts malicious client tags into a process or database. For example, suppose you have a simple guest register program on your site that allows visitors to leave names, email addresses, and short messages. Malicious users can take advantage of this opportunity to insert something other than a short message, such as a picture that is inappropriate for other users or a Javascrīpt that redirects the user to another site, or steals cookie information.

Fortunately, PHP provides the strip_tags() function, which removes any content enclosed in HTML tags. The strip_tags() function also allows you to provide a list of allowed tags, such as <b> or <i>.

Clearing HTML tags from user input

if ($_POST['submit'] == "go"){
//strip_tags
$name =strip_tags($_POST['name']);
$name =substr($name,0,40);
//clean out any potential hexadecimal characters
$name = cleanHex($name);
//continue processing….
}
function cleanHex($input){
$clean = preg_replace("![\][xX]([A-Fa-f0–9]{1,3})!", "", $input);
return $clean;
}

From a security perspective, it is necessary to use strip_tags() for public user input. If the form is in a protected area (such as a content management system) and you believe that the user will perform their tasks correctly (such as creating HTML content for a Web site), then using strip_tags() may be unnecessary and affects productivity. .

There is another problem: if you want to accept user input, such as comments on posts or guest entries, and need to display this input to other users, be sure to put the response in PHP’s htmlspecialchars() function. This function converts the symbol, < and > symbols into HTML entities. For example, the ampersand (&) becomes &. In this case, even if the malicious content avoids the processing of the front-end strip_tags(), it will be processed by the htmlspecialchars() on the back end.

Rule 8: Prevent Data manipulation within the browser

There is a class of browser plugins that allow users to tamper with the head and form elements on the page. Using Tamper Data (a Mozilla plugin), you can easily manipulate simple forms with many hidden text fields to send commands to PHP and MySQL.

The user can launch Tamper Data before clicking Submit on the form. When the form is submitted, he will see a list of form data fields. Tamper Data allows users to tamper with this data and then the browser completes the form submission.

Let’s go back to the example we created earlier. The string length has been checked, the HTML markup has been cleared, and the hexadecimal characters have been removed. However, some hidden text fields have been added as follows:

Hidden variables

Here is the code snippet:

<?php
if ( $_POST['submit'] == "go"){
//strip_tags
$name = strip_tags( $_POST['name']);
$name = substr( $name,0,40);
//clean out any potential hexadecimal characters
$name = cleanHex( $name);
//continue processing….
}
function cleanHex( $input){
$clean =preg_replace("![][xX]([A-Fa-f0–9]{1,3})!", "", $input);
return $clean;
}
?>
<form action="<?php echo $_SERVER['PHP_SELF'];?>"method="post">
<p><labelfor="name">Name</label>
<input type="text"name="name" id="name" size="20"maxlength="40"/></p>
<input type="hidden"name="table" value="users"/>
<input type="hidden"name="action" value="create"/>
<input type="hidden"name="status" value="live"/>
<p><input type="submit"name="submit" value="go"/></p>
</form>

Note that one of the hidden variables exposes the table name: users. You will also see an action field with a value of create. As long as you have basic SQL experience, you can see that these commands may control a SQL engine in the middleware. Anyone who wants to make a big break just needs to change the table name or provide another option, such as delete.

What is left of the problem now? Remote form submission.

Rule 9: Prevent Remote form submission

The benefit of the Web is that it can share information and services. The downside is also the ability to share information and services, because some people do things without fear.

Take the form as an example. Anyone can access a web site and create a local copy of the form using File > Save As on the browser. Then, he can modify the action parameter to point to a fully qualified URL (not pointing to formHandler.php, but to http://www.yoursite.com/formHandler.php, because the form is on this site), doing what he wants Any changes, click Submit, the server will receive this form data as a legitimate communication stream.

First of all, you might consider checking $_SERVER[‘HTTP_REFERER’] to determine if the request came from your own server. This method can block most malicious users, but can’t stop the most obvious hackers. These people are smart enough to tamper with the referrer information in the header, making the remote copy of the form appear to be submitted from your server.

A better way to handle remote form submissions is to generate a token based on a unique string or timestamp and place the token in the session variable and form. After submitting the form, check if the two tokens match. If it doesn’t match, you know someone is trying to send data from a remote copy of the form.

To create a random token, you can use the built-in md5(), uniqid(), and rand() functions of PHP as follows:

Defending remote form submissions

Here is the code snippet:

<?php
session_start();
if ( $_POST['submit'] == "go"){
//check token
if ( $_POST['token']== $_SESSION['token']){
//strip_tags
$name =strip_tags( $_POST['name']);
$name =substr( $name,0,40);
//clean out any potential hexadecimal characters
$name =cleanHex( $name);
//continue processing….
}else{
//stop all processing! remote form posting attempt!
}
}
$token =md5(uniqid(rand(), true));
$_SESSION['token']= $token;
function cleanHex( $input){
$clean =preg_replace("![][xX]([A-Fa-f0–9]{1,3})!", "", $input);
return $clean;
}
?>
<form action="<?php echo $_SERVER['PHP_SELF'];?>"method="post">
<p><labelfor="name">Name</label>
<input type="text"name="name" id="name" size="20"maxlength="40"/></p>
<input type="hidden"name="token" value="<?php echo $token;?>"/>
<p><input type="submit"name="submit" value="go"/></p>
</form>

This technique is effective because session data cannot be migrated between servers in PHP. Even if someone gets your PHP source code, transfer it to your own server and submit information to your server, your server receives only empty or malformed session tokens and the form tokens that were originally provided. They do not match and the remote form submission fails.

--

--

Mina Ayoub

I'm enthusiastic about being part of something greater than myself and learning from more experienced people every time I meet them.