Caching and Scaling Django

Eralp Bayraktar
The Startup
Published in
7 min readMay 17, 2020

Hi! Caching is the most important aspect of scaling along with database indices. It’s not impossible to get up to 99% speed improvements using either one, when done right. I’ve been using Django over 6+ years and I will be sharing my learnings when I had to scale beyond millions of users. The techniques and insights I share by no means are limited to Python nor Django, they are universal. For this tutorial Django's built-in in-memory cache will be enough, you don’t need to set up a redis or memcache server.

You can find all the code I use under https://github.com/EralpB/djangocachingshowcase different versions are marked as different releases.

Photo by Max Duzij on Unsplash

Should you cache?

The downside of caching is that it introduces some hard-to-reproduce errors and bugs, and sometimes inconsistencies like stale or no-more-existing objects/data. Considering this I wouldn’t keep a cache that only brings 20% improvement or one that has a 20% hit rate. I’d aim for 80%.

Is this cache helping? A well-designed cache for a workload.. might bring down the website for another workload. This is why when designing the cache you must absolutely focus on real use cases and request patterns. not artificial tests.

Example Scenario

Let’s start with a bookshop scenario, where we have Book and Author models, and 2 views BookDetail, AuthorDetail.

Imagine on the BookDetail page we want to get trending/top books of the author, as a recommendation. The same logic obviously can be put on AuthorDetail. To share this logic in both views, this is best put as a function to the Author model. You will see caching also forces best practices like DRY (Don’t Repeat Yourself).

If you don’t cache this function, you don’t just risk slower response times, which is by itself an awful thing, BUT also clogging all your web workers and start returning 504s to all other requests. If you have 30 gunicorn workers, your web server can handle 30 simultaneous requests. so if 30 request comes to this function at the same time, your website will be down for 5 seconds. This can cascade obviously to worse problems.

You can find the source code for this version at this tag, it’s marked v0 in Releases.

https://github.com/EralpB/djangocachingshowcase/tree/b93dafaf7f7fd2962334b22e78e0d10e9e14cff0

Loadtesting Version 0 — no caching: I created a loadtest using Locust, it randomly does one of these 3 actions, query index page, query author detail page, and query book detail page. You can see the index page is super fast, but the other 2 are very slow because our function is very slow to calculate.

There’s no need to further test this version, it’s expectedly awful.

Step1: Manual caching, transparent to the caller

Django has amazing cache managing utilities, so we will go with it.

Couple things, how to choose a cache key? I personally go with classname_functionname_extraparameters_objectid or if it's about a specific task I start it with task name or filename. This shouldn't really matter, the only thing matters is that it's unique, we don't want to store 2 authors' top books in the same cache key.

What this function does is,

1) Check if any previously cached value exists in the cache database.
2.a) if yes return it
2.b) if no calculate the expected value
3.b) store the calculated value in cache
4.b) return the calculated value

This is transparent to the caller! A very good feature indeed, except the latency. so you can make this and suddenly whole codebase benefits from this improvement, how good is that? :)

is this cache helpful?

I have no idea. Rule 0 is caching is about the workload. if an author gets 1 query every 6 hours, this cache actually degrades performance. because the hit rate is 0, and instead of just calculating now you have to do additional cache database calls. Although.. there’s still some advantage. it might be slower on regular workload but it gives you burst requests or DDOS protection. if an attacker sends requests to this function, subsequent requests will be very very fast. (couple milliseconds instead of 5 seconds) This doesn’t need to be an attack, a viral effect or a marketing campaign could do the same thing at the moment you don’t expect.

Another thing to keep in mind is this cache favors popular authors, if an author gets 1 req/s (request per second) this cache will be very helpful for his/her page, whereas an author in the long tail will see a performance drop. You will have to weigh ups and downs to do the final decision and change the 4 hours to maybe 24 hours or maybe to 1 hour. How to keep track of these statistics is another posts topic, but you definitely have to have a feeling about the curve and request pattern.

Loadtesting version 1:

Here you can see initially response times are initially high but in a short time they all drop, and our server can handle many many more requests per second thanks to caching.

You can find the source code for this version at this tag, it’s marked v1 in Releases.

https://github.com/EralpB/djangocachingshowcase/tree/f28e8098074f0b7a65db7c2de7474fafbea77004

Step 2: Fixing Stale/non-existing objects

Now with this, you might get an unexpected call, some pages not opening or showing old prices, and so on! Can you guess what’s wrong? Let’s get back to the drawing board:

Imagine redis is storing cached array like [Book ID #1, Book ID #5] the danger is if an admin deletes Book ID #5, suddenly you are listing a non-existing book in the top books module. This can create very weird problems, Django might think DB-integrity is broken if it cannot resolve a foreign key and such that. Or you might correct a typo or do an important price change, the cached value would still be old and stale. The best solution to this is to cache object IDs and fetch fresh objects from the database with cached IDs. Almost always the hard and slow part is executing the logic to decide which objects to show, if you have the IDs ready fetching from the database should be a couple of milliseconds. DBs are optimized for this!

I changed 2 lines, the first return and the cache setting line. Now your function executes super fast and does not have a stale object or update problem!

Loadtesting version 2: There’s no need to loadtest version 2, we haven’t done any performance-related changes. Fetching fresh objects will slow down the endpoint at most couple of milliseconds. Again, nothing to loadtest here.

You can find the source code for this version at this tag, it’s marked v2 in Releases.

Step 3: Making cache transparent to Developer

Now the bad thing about this function is, caching and application logic are mixed, that’s never good news in programming, so I will be separating this 2 logic.

This looks much better! Imagine a developer is tasked with, instead of purchase count let’s order by like or favorite count. That developer only needs to inspect 2 lines of code, and shouldn’t care at all about the caching logic. This reduces complexity from 8 lines to 2 lines. First part we made caching transparent to the caller, now we made it transparent to the logic developer.

What’s next?

Our code looks and performs lovely, also protects us from DDOS to some extent, but there’s one thing that’s not ideal.. 7 lines of code overhead for a function. if the Author model had 5 functions we wanted to cache, we would be repeating ourselves many times. In part 2 I will be using the library I wrote to put caching logic into a function wrapper, so you can enable it in 1 line in a very unintrusive way.

Loadtesting Code

Learn Locust and loadtest your server, if you are not testing don’t bother “optimizing”.

to run this, you need to install Locust and then run the command locust -f filename.py then you can manage your loadtest in your browser, it has a great UI.

Congratulations! Keep caching like a champion, and please follow me on @EralpBayraktar :)

--

--