Preparing for
prepared statements

A guide towards migrating away from mysql_ functions to PDO

· · ·

I have decided to pen this guide after encountering a multitude of questions concerning migration away from mysql_ functions on StackOverflow, as well as my personal experience from updating the codebase of dated PHP files in my line of work.

This guide is written under the preface that you have some basic understanding of PHP and object-oriented programming (OOP), but believe me, when I started writing PDO I have no idea how OOP worked. It’s never a bad place or time to start now.


One of the major complaints I see while prowling through StackOverflow is that people are still using mysql_ functions. These functions will be rendered obsolete because of the deprecation of the mysql function in PHP 5.5 and onwards, and marked for removal in PHP 7. In fact, if you attempt to execute a mysql_ function with recent versions of PHP, you will be thrown an exception:

Deprecated: The mysql extension is deprecated and will be removed in the future: use mysqli or PDO instead

Oh no! You say. Why is this happening?

The mysql extension is not insecure…
but simply harder to use securely.
Exploits of a Mom”, an xkcd comic comically addressing the not-so-comical issue of sanitizing user inputs (or rather, there lack thereof).

There are many good reasons to why mysql_ functions are not worthy of future support. A commonly cited reason in support of migrating away from said functions is because of the inherent security flaws in the mysql extension. While not entirely true, using mysql_ functions requires the programmer, i.e. you, to take proactive, conscious steps to prevent attacks and hijacking attempts. In other words, mysql is not insecure, but simply harder to use securely compared to other available extensions, such as mysqli and PDO. The dated extension also does not support a variety of other recent features, which makes it difficult to work with mysql_ functions in the near future.

The reason why the mysql functions have stayed around for so long is for one simple reason — legacy support. Disabling the extension outright will disrupt plenty of dated web services and scripts that still use the mysql extensions, but this is no decent excuse to why you should start looking at alternatives this moment forward.

While suppressing E_DEPRECATED errors or, god-forbid, falling back to an older version of PHP, are quick and dirty tricks to tiptoe around this issue, your code will not be future proof. Since it’s introduction in the 1990s, no new functions have been added to the mysql extension since 2006, and it is not even being actively maintained anymore.

The two best alternatives out there are the mysqli and PDO extension. There is great amount of debate which one is better, but I personally prefer the latter for very simple reasons — its support for other drivers than mysql itself, and that it accepts named parameters. Both are fast, object-oriented and supports stored procedures.

Establishing a database connection

We need several snippets of information in order to establish a database connection. When you define a new PDO object, it will ask you a few questions:

  • What host name is the database located on?
  • Which database are we working with?
  • What are the logic credentials that allows me access to reading and/or writing to the said database?

It is always safer to store these information in a configuration file, which you can use .htaccess (if you are on an Apache server) to block any requests for the file. This will keep your database connection information safe from (external) prying eyes. In this file, you can define named constants that will store these snippets of information, at the disposal of any database connection requests:

Securing your database connection details.

With this information you can craft a database connection out of thin air, pronto! Below I am presenting a barebones database connection, followed by one that also defines a character set (here I am using utf8, but your flavours may vary) and port (defaults to 3306, but good to know):

Connecting to the MySQL database using PDO.

Easy peasy. Variables and objects declared are considered to exist throughout this article.

p/s: A little note for advanced users — if you’re using the try/catch method of retrieving errors, remember to enabled that via setAttribute():

Catching exceptions thrown by PDO with the try/catch regime.

By catching the exception and dealing with it manually prevents PDO from displaying a backtrace, which contains sensitive information about your database. The trace is very useful for troubleshooting in a development setting, but definitely a no-no for production.


Querying

Now the fun part—let’s construct some queries! But where do we start?

Fetching data

Sorry, Miranda Priestly. I really just have to put this gif right here.

Let’s begin with the basics with an extremely reduced example — say you want to execute a MySQL query that will fetch user data from a database. How should you get around to do it?

For example, we want to know how many users from each country we have in our database. We are not supplying any user input at this point, and are only concerned about retrieving information.

Fetching data from MySQL database.

Accepting user inputs

A typing cyborg. Ghost in the Shell by Production I.G.

Remember that mysql 101 rule — always sanitize your user input — nailed hard into your psyche by your programming teacher? Remember the countless times you have to use mysql_real_escape_string()? One of the reason why mysql_ functions are deprecated because the programmer will have to make a conscious effort to sanitize user input. For PDO, prepared statements come to the rescue. The magic behind PDO is that it sanitizes parameters for you during the process of parameterisation.

PDO makes the need to escape or sanitize input redundant.

This what what PDO actually does — unlike a mysql_ function, it keeps the prepared statement and strings separate, and only combines them upon execution. It protects you from injection attacks, which happens when hackers know you are not sanitizing your data (properly).

In PDO, you are offered several options on how to combine your variables as strings into the prepared statement, before the latter is executed. This is done by:

  1. using named placeholders
  2. using positional placeholders
  3. binding parameters (by reference)
  4. binding values

1 + 2: Named or Positional Placeholders

Both named and positional placeholders work in a similar fashion —their corresponding values are passed as an array in the execute() statement. The only differences are that for the former, you use ‘?’ as a placeholder in your statement, while for the latter, you use ‘:placeholderName’ instead. Moreover, positional placeholders require you to order the array items in the order of which the placeholders appear in your statement, while named placeholders rely on object keys to identify the correct placeholders, therefore making order irrelevant.

Named vs positional placeholders.

3 + 4: Binding Values or References

Alternatively, you can choose to bind variables by reference or value before executing the prepared statement. The difference is that binding by reference (hence using bindParam()) will only evaluate the value of the reference upon execution, unlike binding by value (bindValue()) which will bind an immutable value upon calling of the method which prevents further manipulation before execution. In majority of implementations, the difference is not of great significance.

bindParam() vs bindValue().

Along the same vein, you will have to be careful how you bind values or parameters to placeholders in a PHP for loop. Since bindParam() binds the reference and the reference only, you should modify the array elements directly by prepending the ‘&’ sign, i.e.:

However, this argument is moot considering that you can simply pass the entire array to the execute() command without the need of trigger the overhead of a for loop. This is the recommended way of dealing with multiple placeholders.

Further reading: bindParam() vs bindValue().

Placeholders don’t have a place
in the entire prepared statement

Just like how nice jeans are always priced beyond my reach.

Parameterised prepared statements are not an all-encompassing panacea when it comes to security concerns. Given that you can only bind parameters to placeholders within the WHERE clause, if you are accepting user inputs to (1) select tables or columns to query from, or (2) to dictate the grouping, order and limit of their results, performing sanity checks on user inputs is a reality that you cannot, and should not, run away from.

Parameterisation is not a one-size-fits-all solution.

A sensible solution would be to use a whitelist approach for parameters that have a limited range of keywords, like column and table names, for table and columns, and GROUP and ORDER BY clauses; and forced conversions to integers for the LIMIT clause.

One example: if you are allowing the end user to retrieve data from a list of tables, compare that user input against an internal list of tables — if there are any discrepancy, reject any attempts to connect to, or query from, the database:


Advanced examples

Here are some advanced examples that I feel should not be addressed in the main section in the guide simply because it requires basic understanding of constructing prepared statements and binding values or references to placeholders — which we have already conveniently covered in the previous section.

Executing multiple statements

If you want to execute multiple statements that differ only by values, such as inserting multiple rows of data into your database, you can use beginTransaction() and commit():

Executing multiple prepared statements.

Inquiring about IN queries

Sorry for the pun. Jokes (or rather the lack thereof) aside, preparing IN queries is a bit of a challenge in PDO, but it isn’t exceptionally difficult. If you have an array of n size you want to pass into the IN query, you will have to duplicate the positional placeholder (‘?’) n number of times so that each positional placeholder will be assigned a value upon execution of the statement.

For example, if you have an array of values (10, 38, 56, 93) you want to query with, you will have to use the positional placeholders (?, ?, ?, ?) in your WHERE clause. The trick is to generate the correct number of placeholders so that each will be mapped back to your array of values.

With this in mind, you can use any of the following methods for an array of n size to generate the correct placeholders:

  1. Use array_fill() to fill an array with n placeholders, and then implode() it with a comma as the separator.
  2. Use str_repeat() to repeat the placeholder (‘?’) n times, and str_split() it into a array, for which you then implode() using comma as the separator.
  3. Use str_repeat() to repeat the placeholder and its separator (‘?,’) n times, and remove the last comma with substr().
  4. Use str_repeat() to repeat the placeholder and its separator (‘?,’) n-1 number of times, followed by appending by a closing placeholder (‘?’).
Filling placeholders in the IN clause.

Which method is the best? It is up to you. This is where personal style rules over sense, although I personally prefer the last method because it is the simplest (and likely to have the least overhead).

Repeat n-1 times and append is the fastest.

Which method is the fastest? To ask this question you are running the risk of micro-optimisation, and you would be better off spending time optimizing your query instead. However, for the sake of completeness, I have performed three independent tests with 100,000 repeated runs for each method, and verdict is clear: method 4 clenches the title of fast and furious:

Execution time, measured in microseconds, of 100,000 iterations of methods 1 through 4. Values represent arithmetic averages from three independent runs. Error bars represent standard deviation.

Preparing LIKE queries

Sometimes you would want to query your database using the LIKE operator, which allows you to use ‘%’ as a wildcard symbol. The best part about using prepared statements is that they will automatically parse the ‘%’ as a wildcard symbol when appended and/or prepended to your variable:

Formulating LIKE queries using prepared statements and named placedholders.

Counting rows

At times it is desirable to either access the number of rows returned, or to determine if any rows have been returned from the query. This can be done by using rowCount():

Count the number of rows returned

Note: While using FETCH_ALL is a possible alternative, it is memory inefficient, especially for very large datasets.


Closing the connection

By default, the PDO connection is only open when the PHP script is running and will be closed once the script end. Closing the conection is as simple as assigning null to the PDO object, i.e.:

$db = null;

Closing the connection is considered good practice as it frees up resources for other incoming database connections, although not absolutely necessary — PHP scripts typically have a finite execution time, and will be terminated anyway if it exceeds a pre-determined threshold set by the server. Terminating it will close the connection, but that also means that the database connection might be held open for an unnecessary amount of time.

For the interested:


If you have any more useful tips about migrating to PDO, or have spotted any mistakes in my Gists, do let me know! Leave a note, or tweet me at @teddyrised.