Memcached with PHP

I’m the CTO and Co-founder of Checkout51.com, a grocery coupon website that uses data collected on receipts to verify your purchases and give you cash back for the products you buy. To make all of this work we use a large array of tools and systems that include Ubuntu, PHP, MySQL, Memcached and much more, all hosted on AWS.

First, a bit about Memcached.

Memcached allows you to store copies of objects in memory so that you can avoid having to look them up in your primary data store (in our case MySQL). Storing copies of objects means your database will experience less load, and therefore you can handle higher throughput before you need to spend time and money scaling up your database instance or further optimizing queries.

Unfortunatly like many optimizations in the programming world, introducing any caching layer to your application also introduces complexities and these complexities increase the likelihood that you might overlook an edge case. For example, one of the most common bugs you will encounter after introducing caching is around the incorrect removal of an object from cache (or the lack of it being removed) when the data is changed in your database. If you lookup a user and store the user object in cache at that time, when the user changes an attribute of their account (e.g. their name, password or email address) you need to remember to keep your cache in sync by invalidating the key or updating the object stored in that key to reflect what is now in your database.

Real life Memcached

PHP has a Memcached class that is fairly decent at supporting most of what you need when you initially introduce Memcached to your application. Regardless, I strongly advise that you write a simple interface for this class so you can handle some of the issues you’re likely to encounter down the road without large re-writes.

Here is a quick example class outlining where I would suggest you get started in terms of a custom interface for Memcached for a PHP application.

A basic Memcached interface supporting Get, Set and Delete functions

Granted our current production implementation is no longer this simple, but this is very much in line with how we first approached Memcached. Still this is not to say this is the best path to take, it is just the path we took, and it has worked fairly well.

Some problems you might encounter

Big objects causing sets to fail

Memcached comes with a default max_item_size limit of 1mb. Initially this will be fine for most applications, but eventually you’ll probably encounter a good reason to cache something that is larger than this default value.

You can adjust the item size limit, but before you do this I would suggest you re-consider the need for your big objects. If you do not have a way to reduce the size of your problematic objects, make sure that at the least you know when your sets are failing. Our class discussed above deals with this, the default Memcached implementation provided by PHP returns false which makes it hard to notice if you’re not always checking the return value.

Bad key names causing sets to fail

Another other very common cause of a failed set are key names that are not supported. Keys cannot have spaces, new lines or carriage returns. If you do have any of these characters, you’ll find that your set will fail and you’ll be hitting your database far more often than you intended. Again by monitoring the return value of the set function, as we do in the sample code above, you’ll now notice when this is happening.

Bias key distribution

This is a really big problem, it’s my favourite Memcached problem. When Memcached sets a key, it only sets it on one of the servers (know this!) meaning that when you execute a get, you’re only hitting one of your Memcached servers. If every page load requires that key, and your application is popular, you’re going to find one of your Memcached servers is handling a larger portion of the load than the others. Also, this problem will be further amplified if that key happens to contain a very large object.

The lovely people at Facebook have also encountered this issue and developed the MCRouter project to resolve this, and many others issues that you will eventually encounter. In my opinion, implementing MCRouter is a lot of work, and while it is a great solution you should probably consider some other options before taking this rather large step.

Dealing with bias key distribution without MCRouter

Most of the common Memcached problems you’ll encounter will be easily solved, this problem is a little difficult to solve and requires careful attention to a few of your performance metrics or it might sneak up on you in the middle of the night.

There are generally two performance metrics you’ll want to really carefully watch:

  • CPU
  • Network Bytes Out.

CPU is important for obvious reasons, but Network traffic is a less obvious place to look. When you see the network usage of one server higher than one of its counter parts, you certainly have one very popular (and possibly large) key being requested very regularly. You have a key distribution problem.

With AWS Elasticache servers running Memcached on r3.2xlarge instances, you’ll find that your network out flatlines at around 8,000,000,000. The graph below is the kind of picture you can expect when your keys are unevenly distributed. Interestingly in these cases you do not always see your CPU graphs unevenly distributed, so you’ll need to check both.

To address this, you could go and use MCRouter as I previously mentioned. Unfortunatly MCRouter means some setup, testing and the introduction more complexity to your technology stack (not something I would suggest you rush into). Personally I would rather take a minor performance hit to avoid introducing another tool that you might not really need just yet.

“So lets just set the key on all the Memcached servers…” — Me

This is an ugly hack, right? And you’re probably cringing saying something like “This will make all the Memcached sets slower”. Well, yes and yes but not by much. Most sets in our application are between 1–2ms, and you do not need to do this for all of your keys, just the ones you know are causing distribution issues.

Fortunately, using this approach there will be no impact to your Memcached gets, as you can simply randomly perform your “get” from any one of your Memcached servers. While this is not ideal, in reality unless you have a very large number of Memcached servers, you’re probably adding less than 20ms to your requests (less than MCRouter will add). I’m not a fan of ugly hacks, but if you’re still hunting for the that illusive product market fit, I would suggest you focus your efforts there and not in finding a perfect Memcached configuration.

Lets apply this suggestion to our Memcached class from earlier:

A more advanced Memcached interface that allows you to manually manage key distribution

To get this working in your application you just need to update the array in the GetHosts function, and the array in the IsShardedKey function. Any key prefix listed in the IsShardedKey array will get pushed to all servers, and gets will be randomly performed against one of your Memcached servers, solving your bias key issue.

Debugging Memcached

Its worth knowing a little bit about how to get useful information out of Memcached. The most useful thing I’ve ever extracted from Memcached is information about what gets, sets and deletes are being performed against the server. To do this you only need to know a few key commands.

First connect to your Memcached server using your favourite terminal:

telnet 127.0.0.1 11211
Trying 127.0.0.1…
Connected to 127.0.0.1.
Escape character is ‘^]’.

Next lets flush the existing stats buffer and then turn on collection of detailed stats

stats reset
RESET
stats detail on
OK

Once you’ve waited long enough (probably 30 seconds?) for some activity within your application to generate informative stats, you can turn stats collection back off and dump the details. Keep in mind that collecting detailed stats does introduce a pretty significant performance cost, so I would not suggest leaving it running for too long.

stats detail off
OK
stats detail dump
PREFIX User get 3 hit 2 set 1 del 0
PREFIX notif_list get 1 hit 0 set 1 del 0
PREFIX user get 1 hit 0 set 0 del 0
PREFIX country get 2 hit 0 set 2 del 0
PREFIX unread_notif_count get 2 hit 0 set 0 del 0
END

You will now have a list of the recent activity between Memcached and your application. If you do have an issue with bias keys, this output will likely help you track down which key is causing the trouble.

It is important to note that if you do not separate attributes within your Memcached key names, you will get a much less useful output here. Ideally your keys would be constructed similar to the following:

$key_name = “user::” . $this->user_id;

If you reversed this key to start with the user_id, or omitted the “::” delimiters but kept the same content, when you did your dump of key information you would find the output was much larger, and you would struggle to extract meaningful information.

Closing thoughts

Your approach to caching is an ongoing battle. For most of their life, popular applications are just struggling to keep load away from their database servers, and push it over to something like Memcached or Redis. Eventually you get to a point where you’ve moved so much load over to your caching system that a new bottleneck has been created. This is OK, it just means you’ve graduated to a new class of problem, your product is being used, congrats and keep up the hard work!

For bonus points if you’re using New Relic, compliment the error_log statements in the sample code above Get and Set functions with the New Relic PHP API function newrelic_notice_error to track how often failures are occurring. Doing this will allow you to trigger PagerDuty alerts based on your apps error rate, and hopefully avoid some late night phone calls.

TL;DR

When dealing with caching in a PHP application that uses Memcached, keep in mind the following:

  • Track failing sets and deletes so you can address them before they take you offline. Failing sets means missed gets, means increased database load.
  • Add an abstraction layer (like something I’ve offered above) to your application so when you do need to get between your code and Memcached you do not need to refactor everything.
  • Consider looking into tools like MCRouter before you run into key distribution problems.
  • Prefix your keys, use a delimiter and do not start keys with the ID of the object being stored. Otherwise debugging key stats is very difficult.
  • If you do run into key distribution problems, remember this article because you do not need to drop Memcached and re-write your app. There are always multiple ways to solve any problem, especially this one!
  • Monitor Memcached CPU and Network performance(at the least). If these graphs are not even for all your Memcached servers, something is probably wrong.
Show your support

Clapping shows how much you appreciated Andrew McGrath’s story.