Migrating from ElastiCache for Memcached to DAX (DynamoDB accelerator)

Santosh Bhat
Jun 15, 2020 · 4 min read
Image for post
Image for post

The objective of this article is to explain why ad server team at GumGum migrated from AWS ElastiCache for Memcahed to AWS DAX as a cache for DynamoDB and how did it benefit us.

What is Caching?

Caching is the process of storing data into a cache. A cache is a temporary storage location where data is kept for later use. It avoids running expensive queries on databases & data stores for same data.

At GumGum, one of our primary data stores for contextual targeting purposes is DynamoDB, which stores ~1.2 billion rows of page data and ~ 50 billion rows of visitor data. Latency is one of the most essential part of our ad server as we need to respond to an ad request in less than 60 ms. To efficiently consume this huge amount of data and keeping latency as low as possible, caching becomes extremely necessary.

Why was moving away from Memcached necessary?

We started seeing a lot of latency issues in Memcached due to which we would see ~ 3–5 million errors/misses per day, which would result in more requests to and more read units being consumed by DynamoDB, which affected the latency of the entire application.

The latency and network issues related to Memcached in JAVA can also be attributed to Spymemcached, the JAVA client for Memcached. Spymemcached does not attempt to reconnect if a connection was terminated remotely. Spymemcached is an asynchronous, single-threaded Memcached client. A single client only uses a single IO thread. This has a trap. There are many threads in your application, so the CPU time slices of threads allocated to Spymemcached client may be very few, which will reduce Spymemcached performance.

What is AWS DAX?

Image for post
Image for post

DAX, short for DynamoDb Accelerator, is a caching service that enables you to benefit from fast in-memory performance for demanding applications. As the name suggests, it is a service specifically designed for DynamoDB.

It has two types of cache stores internally:

  1. Item Cache — Used to store results of GetItem and BatchGetItem operations

DAX in itself implements a read-through/write-through policy, but at GumGum, we write around DAX, which means data can be written in DAX only through reads and once data is written, it is removed only after it expires(as there is ). This results in low latency for write requests. TTL of 20 mins is sufficient enough to not have us worried about data not being updated for long periods.

The following steps outline the process for a read-through cache:

  1. Given a key-value pair, the application first tries to read the data from DAX. If the cache is populated with the data (cache hit), the value is returned. If not, on to step 2.

How did we migrate to DAX?

If you already have a Java application that uses Amazon DynamoDB, migrating to DAX should be fairly easy. You have to modify it so that it can access your DAX cluster but rewriting the entire application is not necessary because the DAX Java client is similar to the DynamoDB low-level client included in the AWS SDK for Java. For asynchronous implementation, we used ClusterDaxAsyncClient, along with Java Future .

The main hurdles we faced were optimizing the query requests we send to DAX. As we were using Query requests, it was essential to eliminate any variables from the query. For the Query cache, the entire QueryRequest is a key, so variables like timestamp or finely granulated requests with lots of parameters will affect the performance as the query will never be unique enough for the cache to work effectively. These variables/parameters had to be moved outside of the request into a post processing method till we achieved ~99% of cache hits.

How did it benefit us?

As mentioned earlier, adopting DAX and using its client is particularly easy when you already have a DynamoDB client & code in place. This helped us greatly in our transition and as the read-through policy is implemented in DAX internally, it also helped in getting rid of the verbosity and boilerplate code which was a result of using two separate clients, Memcached and DynamoDB.

Image for post
Image for post

As indicated by the graph above, we saw errors due to cache misses reduce from ~ 5–10 million errors a day to lesser than 100 errors a day. Since the migration, due to the improvement in number of cache hits, the read units consumed by DynamoDB have also been halved.

Image for post
Image for post

We’re always looking for new talent! View jobs.

Follow us: Facebook | Twitter | | Linkedin | Instagram


Thoughts from the GumGum tech team

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store