Cache Repetitive Ecto Queries
Basic Caching with GenServer — Part 1
Caching is a complicated subject, and can add a lot of complexity to any application development. But, as per the title of the article, we are going to build a very simple kind of caching. We will focus on repetitive operations that have a high impact on the DB and take advantage of GenServer.
Can we store those queries as a GenServer state? Is it faster than the normal DB query? Let’s find out!
The example app is an online shop. More precise, a very small functionality that handles the products discounts. Let’s assume we have a homepage. It is identical for all our shop users. It will display the top 10 of our products, ordered by discount percentage. We do not care about the front-end part, but only about the backend API. A JSON response will be enough.
Another assumption is that our shop has lots of visitors, and a moderate rate of adding new products. We will see why this is an important factor to consider.
Why does this example qualify for a simple caching implementation?
- the top discounts is the same for all users. That means we have exactly the same query executed over and over again on each page visit.
- race conditions are not a major problem. Even if you add a new product that changes the top, the order of the operations is not critical. The new product will be there on the next visit.
- high traffic means a lot of requests for the discounts. Those do not change the data, instead just query it. Adding products will be the only operation that will trigger a cache refresh. We decided above that we have a moderate rate of adding new products. Think about an opposite example. We receive lots of data from many sources, but the number of visitors is very low. Caching top products query after each update would not make any sense. It would probably make our system slower in the end.
As a first conclusion, you need to know very well your system and user behaviour. Only then you can decide to cache any DB operations, even if you find them repetitive.
The shop will be a simple Phoenix app. To make things more interesting we’ll be using the new version: 1.3, rc.0 at the time I’m writing this.
Phoenix uses a new generator:
mix phx.new shop
The article will not include every single line of code in the app, but you can check the full version on github
The excellend Phoenix json generator will create all the boilerplate code we need:
▶ mix phx.gen.json Sales Product products previous:integer actual:integer
If you are not familiar with Phoenix 1.3, please follow the link above and read about the new structure. Otherwise, you may not find some things in the place you expect them to be.
For the purpose of our test, we seed the DB with 5000 products with random prices.
For easy tracking, we change the default controller functions to
new_product. Those are the only 2 actions available in the app. For benchmarking simplicity both are
get requests. The new product prices are random generated.
Change the router to accept
:version as param. We will use this to benchmark different implementation versions for the same function.
The v1 is the default implementation. We query the DB for top 10 discounts, every time a user is accessing the page.
Shop.Sales.list_products() will query the discounts.
With that, we are ready for the first benchmark. My tool of choice for this example is Siege. It’s an HTTP testing and benchmarking utility. It allows us to measure our system under heavy stress. It is really easy to install, use and interpret the results, at least for our very simple test case.
Benchmarking Top Discounts
We start by benchmarking the top discounts. We want our shop users to have a smooth experience, and the home page to load as fast as possible.
First, we reset the database and repopulate it with the seeds. Please note that even not mentioned, I will repeat this step before every benchmark that requires creating new products.
▶ mix ecto.reset
▶ mix run priv/repo/seeds.exs
Then we run the siege with the following options. We’ll keep those identical for all future benchmarks:
- first we pass in the URL we want to benchmark: in this case
http://127.0.0.1:4000/api/top_discounts/v1. This points to the first version of the implementation of our
-t60s- the test will run for 60 seconds
-c100- is the number of concurrent simulated users
▶ siege http://127.0.0.1:4000/api/top_discounts/v1 -t60s -c100
Transactions: 2677 hits
Availability: 100.00 %
Elapsed time: 59.16 secs
Data transferred: 1.49 MB
Response time: 1.92 secs
Transaction rate: 45.25 trans/sec
Throughput: 0.03 MB/sec
Successful transactions: 2677
Failed transactions: 0
Longest transaction: 2.46
Shortest transaction: 0.15
We are mostly interested in the number of successful transactions 2,677 and the average response time 1.92s. The response time looks quite high when the system is under stress. It’s time to improve it.
Implementing the Cache
As said before, this is an extremely simple version of cache. It’s designed to handle only this kind of particular repetitive task. We do not go into complex subjects as cache key expiration. The top discounts cache will refresh each time we add a new product. For that we will implement
v2 version of our controller function:
The top_discount v2 gets the top discounts from the Cache (which we will detail next).
The Cache itself is just the state of a basic GenServer implementation:
When the GenServer starts, the
init function sets the server state to the current top discounts. This is done by calling
Sales.list_products(), the same function used in top_discounts v1. But there is a catch. In the v1 it runs every time a user access the home page. Now it is called only once, when the GenServer starts, and then again when new products are added. For top_discounts v2, the user request will not hit the database, but the GenServer state.
Remember to add Cache server to the Phoenix supervision tree (application.ex):
..... worker(Shop.Cache, ) .....
We can now test the new implementation:
▶ siege http://127.0.0.1:4000/api/top_discounts/v2 -t60s -c100
Transactions: 17434 hits
Availability: 100.00 %
Elapsed time: 59.54 secs
Data transferred: 9.71 MB
Response time: 0.09 secs
Transaction rate: 292.81 trans/sec
Throughput: 0.16 MB/sec
Successful transactions: 17434
Failed transactions: 0
Longest transaction: 0.56
Shortest transaction: 0.02
Let’s analyse the results:
The results are impressive, way over my expectations. Under heavy load, the cached version is a lot faster than the initial query based implementation. It is a huge improvement that requires little effort. However, you will need to decide when this kind of implementation makes sense. If you think that some of your system components will go out of sync because of the async cache setter, this simple caching may not be right for your app.
But if you find yourself running the same query over and over again, you can try to use this GenServer caching implementation.
Don’t forget to measure the results.
In the article next article, we will continue this example. We’ll be adding new products and refresh the cache.
Originally published at iacobson.net — March 31, 2017.