Caching: the single most helpful strategy for improving app performances

Dr Milan Milanović
5 min readApr 14, 2024

In system architecture, caching stores frequently accessed data in a faster, more readily accessible location. This technique allows you to retrieve it quickly without hitting the original, potentially slower source every single time. The core benefit is improved performance and efficiency.

Caching stores copies of frequently accessed data in a readily accessible location, reducing access time and offloading primary data sources. Caching is particularly important in monolithic systems, where all components are tightly integrated and run as a single service, and caching plays a crucial role in enhancing performance and scalability.


The advantages of caching are the following:

  • Reduced Latency: By keeping frequently used data close at hand, the cache significantly reduces the time it takes to access that data, resulting in a faster and more responsive user experience.
  • Lower Load: By handling frequent requests, caching reduces the burden on the original data source (databases, servers). This frees up resources for other tasks and improves overall system scalability.
  • Improved Scalability: A well-designed cache can handle more requests than the original data source, making your system more scalable.

Some everyday use cases for caching are:

  • Frequently accessed data: Any data that users or applications access repeatedly is a prime candidate for caching. Don’t cache only database queries, as reading for cache is much faster than an API call.
  • Read-heavy workloads: Systems with a high volume of reads compared to writes benefit most from caching. The cache absorbs the brunt of read requests, minimizing the load on the primary data source.
  • Reducing network traffic: Caching can be particularly helpful in distributed systems where data resides on remote servers. By storing a local copy, you can avoid frequent network calls, improving performance, especially for geographically dispersed users.
  • Performance critical: Caching can significantly improve performance in scenarios where low latency and high throughput are critical.

But how do we decide how to cache something? Adding a cache comes with costs, so for each candidate, we need to evaluate the following:

  • Is it faster to hit the cache?
  • Is it worth storing?
  • How often do we need to validate?
  • How many hits per cache entry will we get?
  • Is it local or shared cache?

Yet, cached data is stale, so there can be situations in which it is inappropriate.

Caching Strategies

Caching improves reading frequently accessed data, but populating and updating the cache is nontrivial. We have the following strategies for this:

Read Strategies

There are two main reading strategies, as follows:

1. Cache-Aside

The application manually manages data storage and retrieval from the cache. On a cache miss, data is fetched from the primary storage and then added to the cache. We should use it when cache misses are rare and DB read is acceptable.


  • Simpler implementation. Application logic handles cache updates.
  • More granular control over what’s cached.
  • The cache only holds actively requested data.


  • Extra database access on cache miss (potentially 3 round trips).
  • Data can become stale if the origin isn’t updated through the cache.

2. Read-Through

When a cache miss occurs, the cache automatically loads data from the primary storage. This simplifies data retrieval by handling cache misses internally. We use it to abstract DB logic from the application code.


  • Simpler application code, cache handles data retrieval.
  • Ensures data consistency (cache and origin are always the same).


  • More database load on reads (cache might not be hit every time, unnecessary database access).
  • Increased complexity of cache and origin have different data formats (data transformation might be needed).

Write Strategies

There are three main writing strategies, as follows:

1. Write-Around

Data is written directly to the primary storage, bypassing the cache. This strategy is effective when writing is frequent, and reading is less common. We should use it when written data doesn’t need to be immediately read from the cache.


  • Fastest writes since the only cache is updated (reduces the load on origin).
  • The database is always up to date and acts as the one source of truth.


  • Data becomes inconsistent (cache holds data not reflected in the origin).
  • Rarely used due to the high risk of data divergence (stale data in cache).

2. Write-Back

Data is first written to the cache and later synchronized with the primary storage. This reduces the number of write operations but risks data loss if the stock fails before syncing. We should use it in write-heavy places where a slight data loss is acceptable.


  • Faster writes since cache update is decoupled from the origin (improves write performance).
  • Reduces load on the origin database for writes.


  • Potential data inconsistency during failures (data might be in cache but not origin).
  • Requires additional logic to handle retries and ensure eventual consistency (data eventually gets written to the origin).

3. Write-Through

Data is simultaneously written to both the cache and the primary storage, ensuring consistency but potentially increasing write latency. This is ideal for scenarios where data integrity is crucial.


  • Ensures data consistency (cache and origin are always the same).
  • Simpler implementation, updates happen together.


  • Slower write due to double update (cache and origin).
  • Increased load on origin database for writes (can become bottleneck).
Cache Strategies

In addition to this, some things can go wrong with cache systems, namely:

  • The Thundering Herd Problem. When the cache expires, numerous requests bombard the backend simultaneously, overwhelming it. We can solve this problem by implementing staggered cache expiration and using locks or message queues to manage request flow. This will prevent the overload and ensure smooth backend operations. Also, we should not set all cache keys to expire simultaneously. Add a bit of randomness to spread out the load.
  • Cache Breakdown. During intense load, the cache fails, directing all traffic to the database and causing performance bottlenecks. We can solve it by setting optimal cache expiration times, employing rate limiting, and layer caching mechanisms (like in-memory and distributed caches) to distribute the load and protect the backend.
  • Cache Crash. The caching service crashes, causing a complete loss of cached data and direct database querying. To solve this, we can design a resilient caching architecture (cache cluster) with regular backups and a secondary failover cache to ensure continuity and performance stability. We can also implement a circuit breaker mechanism. When the cache fails repeatedly, the application temporarily bypasses it to prevent overloading the database.
  • Cache Penetration happens when queries for non-existent data bypass the cache, increasing database load unnecessarily. We can solve it by adopting a cache-aside pattern, where all data requests check the cache first. We use negative caching to handle missing data efficiently, reducing unnecessary database hits. Also, we can store a placeholder (like “null”) for non-existent keys in the cache, preventing unnecessary database queries.

Read the rest of the article, including bonus chapters in the Tech World With Milan Newsletter.

Originally published at on April 11, 2024.