Migrating from ElastiCache for Memcached to DAX (DynamoDB accelerator)
The objective of this article is to explain why ad server team at GumGum migrated from AWS ElastiCache for Memcahed to AWS DAX as a cache for DynamoDB and how did it benefit us.
What is Caching?
Caching is the process of storing data into a cache. A cache is a temporary storage location where data is kept for later use. It avoids running expensive queries on databases & data stores for same data.
At GumGum, one of our primary data stores for contextual targeting purposes is DynamoDB, which stores ~1.2 billion rows of page data and ~ 50 billion rows of visitor data. Latency is one of the most essential part of our ad server as we need to respond to an ad request in less than 60 ms. To efficiently consume this huge amount of data and keeping latency as low as possible, caching becomes extremely necessary.
Why was moving away from Memcached necessary?
We started seeing a lot of latency issues in Memcached due to which we would see ~ 3–5 million errors/misses per day, which would result in more requests to and more read units being consumed by DynamoDB, which affected the latency of the entire application.
The latency and network issues related to Memcached in JAVA can also be attributed to Spymemcached, the JAVA client for Memcached. Spymemcached does not attempt to reconnect if a connection was terminated remotely. Spymemcached is an asynchronous, single-threaded Memcached client. A single client only uses a single IO thread. This has a trap. There are many threads in your application, so the CPU time slices of threads allocated to Spymemcached client may be very few, which will reduce Spymemcached performance.
What is AWS DAX?
DAX, short for DynamoDb Accelerator, is a caching service that enables you to benefit from fast in-memory performance for demanding applications. As the name suggests, it is a service specifically designed for DynamoDB.
It has two types of cache stores internally:
- Item Cache — Used to store results of GetItem and BatchGetItem operations
- Query Cache —Used to store results of Query and Scan operations
DAX in itself implements a read-through/write-through policy, but at GumGum, we write around DAX, which means data can be written in DAX only through reads and once data is written, it is removed only after it expires(as there is ). This results in low latency for write requests. TTL of 20 mins is sufficient enough to not have us worried about data not being updated for long periods.
The following steps outline the process for a read-through cache:
- Given a key-value pair, the application first tries to read the data from DAX. If the cache is populated with the data (cache hit), the value is returned. If not, on to step 2.
- Transparent to the application, if there was a cache miss, DAX fetches the key-value pair from DynamoDB.
- To make the data available for any subsequent reads, the key-value pair is then populated in the DAX cache.
- The key-value pair then returns the value back to the application.
How did we migrate to DAX?
If you already have a Java application that uses Amazon DynamoDB, migrating to DAX should be fairly easy. You have to modify it so that it can access your DAX cluster but rewriting the entire application is not necessary because the DAX Java client is similar to the DynamoDB low-level client included in the AWS SDK for Java. For asynchronous implementation, we used
ClusterDaxAsyncClient, along with Java
The main hurdles we faced were optimizing the query requests we send to DAX. As we were using Query requests, it was essential to eliminate any variables from the query. For the Query cache, the entire QueryRequest is a key, so variables like timestamp or finely granulated requests with lots of parameters will affect the performance as the query will never be unique enough for the cache to work effectively. These variables/parameters had to be moved outside of the request into a post processing method till we achieved ~99% of cache hits.
How did it benefit us?
As mentioned earlier, adopting DAX and using its client is particularly easy when you already have a DynamoDB client & code in place. This helped us greatly in our transition and as the read-through policy is implemented in DAX internally, it also helped in getting rid of the verbosity and boilerplate code which was a result of using two separate clients, Memcached and DynamoDB.
As indicated by the graph above, we saw errors due to cache misses reduce from ~ 5–10 million errors a day to lesser than 100 errors a day. Since the migration, due to the improvement in number of cache hits, the read units consumed by DynamoDB have also been halved.