Cacheable Entities: A Laravel package with a story
At Studocu, where we handle tons of data — hundreds of millions of records and a whopping 2 million people visiting us daily (around 60 million monthly users) — every tiny moment of a user’s request counts. Back in the early days, like any other Laravel project, we kicked off using the default cache-aside strategy, teaming up with Redis as our go-to cache driver.
Studocu started small; a small team, a growing codebase, and a few features. Fast forward to today, and we proudly sit among the top 750 visited websites globally. But getting here wasn’t a walk in the park. Scaling up meant dealing with new challenges, like figuring out how to smoothly cache our fetched data without making life complicated. Stick around as we spill the beans on how we tackled the cache puzzle, aiming for top-notch performance and easy scalability.
Amidst the growth in teams and the introduction of new features, we found ourselves accumulating some legacy code and practices. At a certain point, we grappled with divergent caching strategies, stray cache keys, and uncertainty surrounding what to cache and how to go about it.
That is when we started to take a step back, have a bird-eye picture of our current caching practices at that time, and then come up with a standardized approach to deal with it.
—
After reading this article, you will:
- Gain insights into the challenges we encountered amid a rapidly expanding codebase and company.
- Explore the cacheable entities: An opinionated infrastructure for standardized caching.
- Leverage cacheable entities for the “stale cache” technique to further optimize response times.
- Discover the art of serializing/deserializing cache values to significantly trim down cache sizes.
The Problems We Faced
Before we jump into the solution, let’s talk about the issues we were dealing with:
- Messy Cache Keys and TTLs: Imagine cache keys and TTLs scattered everywhere in the code — Controllers, Event Listeners, Tests, you name it. Changing one key meant going on a file-editing spree to keep everything in sync. It was a bit of a headache.
- Inaccessible Real-time Values: Some classes, or “entities,” always get their results from the cache. But figuring out how to grab real-time values when needed? That wasn’t clear, leading to some inconsistency.
- Async Update Challenges: Dealing with cached values that could be updated asynchronously wasn’t a walk in the park. Developers started to come up with different solutions for their use cases.
- Serialization/Deserialization: Making sense of serializing and deserializing values became a bit of a puzzle. The process wasn’t consistent across our cached values, adding another layer of complexity.
These were the problems we had to sort out, setting the stage for our journey to a more straightforward and standardized cache-aside strategy.
What We Needed
As mentioned earlier, we’ve been interacting with the cache infrastructure directly in our application code. These separate parts of the codebase had to know the TTL and the cache key themselves.
We had two primary areas to address:
Key and TTL Management:
- Crafting cache keys.
- TTL encapsulation.
Caching and Accessing Cacheable Values:
- Dealing with the same cacheable values using varied caching strategies (more on this below).
- Making the process of serializing and deserializing cached values more straightforward.
These were the specific needs that nudged us toward a more cohesive and standardized approach.
Our Approach to Caching
In our scenario, we navigated between two distinct caching strategies: Blocking and Non-blocking cache.
- Blocking Cache (Synchronous): If we don’t have the value, we compute it, cache it, and serve up the result right away.
- Non-blocking Cache (Asynchronous): If we don’t have the value, we dispatch a job to compute it, and return an empty state (like null, empty collection, or an empty array).
Every cacheable value had to be flexible enough for clients to cache or access it using either strategy. Sometimes, the same entity needed to be cached or accessed differently — either via the blocking or non-blocking route — depending on the context. For instance, handling a Web request might use the non-blocking strategy, while an API (Async) request could opt for the blocking strategy.
The solution: a.k.a Cacheable Entities
To bring a uniform approach to caching across our codebase, we cooked up an internally standardized (or as we like to call it, opinionated) infrastructure known as “Cacheable Entities”. This infrastructure acts as an abstraction layer to extract away cache-related responsibilities.
We isolated this infrastructure into a standalone Laravel package and made it open-source. Feel free to check out the Cacheable Entities repository, and let’s dive into how you can make the most of this infrastructure.
Defining a cacheable entity
To make a class cacheable entity it has to implement the StuDocu\CacheableEntities\Contracts\Cacheable
contract.
The interface implementation requires defining the following methods:
- getCacheTTL: Returns the TTL of the cache in seconds.
- getCacheKey: Returns the cache key.
- get: Computes the Entity value to be cached.
Reading the value of a cacheable entity
Using the two caching strategies, mentioned above, with Cacheable Entities is made easy thanks to the two available utility classes: SyncCache
and AsyncCache
StuDocu\CacheableEntities\SyncCache@get
: Accepts a cacheable entity and will wait and cache the result if not pre-cached yet. Then returns the value.StuDocu\CacheableEntities\AsyncCache@get
: Accepts a cacheable entity and will return the cached value if it is a cache hit. Otherwise, it will dispatch a job to compute the cache value asynchronously and then return an empty
Example
<?php
use App\Models\Author;
use App\Models\Book;
use Illuminate\Database\Eloquent\Collection;
use StuDocu\CacheableEntities\Contracts\Cacheable;
/**
* @phpstan-type ReturnStructure Collection<int, Book>
*php
* @implements Cacheable<ReturnStructure>
*/
class AuthorPopularBooksQuery implements Cacheable, SerializableCacheable
{
public const DEFAULT_LIMIT = 8;
public function __construct(
protected readonly Author $author,
protected readonly int $limit = self::DEFAULT_LIMIT,
) {
}
public function getCacheTTL(): int
{
return 3600 * 24;
}
public function getCacheKey(): string
{
return "authors:{$this->author->id}:books:popular.v1";
}
public function get(): Collection
{
return Book::query()
->join('book_popularity_scores', 'book_popularity_scores.book_id', '=', 'books.id')
->where('author_id', $this->author->id)
->whereValid()
->whereHas('ratings')
->orderByDesc('document_popularity_scores.score')
->take($this->limit)
->get();
}
}
// Usage
$query = new AuthorPopularBooksQuery($author);
// Get a non-blocking cache result in the web endpoint.
resolve(\StuDocu\CacheableEntities\AsyncCache::class)->get($query);
// Get a blocking cache result in the API endpoint.
resolve(\StuDocu\CacheableEntities\SyncCache::class)->get($query);
Serialization/Deserialization
Serializable cache works by providing a behind-the-scene layer to the cacheable entity that will handle serializing and deserializing the cache value without the awareness of the client code.
Why it is needed
This pattern is used in cases where we want to save space. We cache only metadata of the value, say an array of model IDs, later we can fetch the data again from the database. The second time we fetch the data, we will directly get the modes via their IDs. This will be a fast query as we are directly getting what we need, rather than computing what we need (which is what takes time).
The pros and cons are:
- + Small cache footprint
- + It allows excluding no longer valid items from the response (like deleted/deactivated models after they were cached).
- - It doesn’t necessarily reduce the number of DB requests (although it reduces the complexity of the queries).
Usage
To make a cacheable entity serializable, you will need to implement the following contract StuDocu\CacheableEntities\Contracts\SerializableCacheable
.
The interface implementation requires defining the following methods:
serialize(mixed $value): mixed
: Prepares the result for the cache. It will be called anytime a cacheable entity is about to be cached. The result of this method will be the cache value.unserialize(mixed $value): mixed
: Restores the original state of the cached values. It will be called anytime a cache value is read. The result of this method is what will be returned as the cache value.
Example,
<?php
// [...]
use StuDocu\CacheableEntities\Contracts\SerializableCacheable;
class AuthorPopularBooksQuery implements Cacheable, SerializableCacheable
{
// [...]
/**
* @param Collection<Book> $value
* @return array<int>
*/
public function serialize(mixed $value): array
{
// `$value` represents the computed value of this query; it will be what we will get when calling self::get().
return $value->pluck('id')->all();
}
/**
* @param int[] $value
* @return Collection<int, Book>
*/
public function unserialize(mixed $value): Collection
{
// `$value` represents what we've already cached previously, it will the result of self self::serialize(...)
$booksFastAccess = array_flip($value);
$books = Book::query()
->findMany($value)
->sortBy(fn (Book $book) => $booksFastAccess[$book->id] ?? 999)
->values();
$this->setRelations($books);
return $books;
}
/**
* @param ReturnStructure $books
*/
private function setRelations(Collection $books): void
{
$books->each->setRelation('author', $this->author);
// Generally speaking, you can do eager loading and such in a similar fashion (for ::get and ::unserialzie).
}
}
// Usage is still unchanged.
$query = new AuthorPopularBooksQuery($author);
// Get a non-blocking cache result in the web endpoint.
resolve(\StuDocu\CacheableEntities\AsyncCache::class)->get($query);
// Get a blocking cache result in the API endpoint.
resolve(\StuDocu\CacheableEntities\SyncCache::class)->get($query);
Given that this query is now an instance of a “serializable cacheable” it will be serialized/unserialized on the fly anytime is read through the SyncCache
or AsyncCache
utilities. The client code doesn't have to change, it uses the same API to read the value.
💡 Thanks to PHP type covariance we were able to overwrite the return types of serialize and unserialize from mixed to something more specific.
Caveat when unserializing
Depending on how you serialize your models, you might lose the original order when unserializing, e.g. when only caching the IDs.
For Entities where the order matters. Make sure to retain the original order when unserializing.
Here are some examples of how to do so,
// Retaining the original order with array_search
$books = Book::query()
->findMany($value)
->sortBy(fn (Book $book) => array_search($book->id, $value))
->values();
// Retaining the original order with array_flip.
// A faster alternative than the above, using direct array access instead of `array_search`.`
$booksFastAccess = array_flip($value);
$book = Book::query()
->findMany($value)
->get()
->sortBy(fn (Book $book) => $booksFastAccess[$book->id] ?? 999)
->values();
// Retaining the original order with SQL.
$books = Book::query()
->orderByRaw(DB::raw('FIELD(id, ' . implode(',', $value) . ')'))
->get();
Purging the cache
Anytime you want to invalidate the cache value of a cacheable entity, you need to use the SyncCache::forget
method.
The use of SyncCache for this is because the invalidation happens on the spot.
Here are some examples of how to do so,
<?php
$query = new AuthorPopularBooks($author);
// Invalidate the cache (for example, in an event listener).
resolve(\StuDocu\CacheableEntities\SyncCache::class)->forget($query);
Async cache default value
When using the AsyncCache
utility, it will return null
on a cache miss. In some cases, you might need to change the default value. All you need to do is make the cacheable entity implement the following interface StuDocu\CacheableEntities\Contracts\SupportDefaultCacheValue
The interface implementation requires defining the following method:
- getCacheMissValue: specifies the default value on cache miss (when using the async strategy).
Example,
<?php
// [...]
use Illuminate\Database\Eloquent\Collection;
use StuDocu\CacheableEntities\Contracts\SupportsDefaultValue;
class AuthorPopularBooks implements Cacheable, SupportsDefaultValue
{
public function getCacheMissValue(): Collection
{
return Collection::empty();
}
}
Generic Annotation
At studocu, we use PHPstan (via Larastan) for our static analysis. Given that the SyncCache
and AsynCache
utilities are annotated with something as generic as mixed
. We needed to make sure that we were safely using our cacheable entities. For that reason, Cacheable Entities infrastructure was built with generic in mind. This allows PHPstan to understand the return types of the cacheable entity (via the cache utilities) and safeguard our codebase.
Let’s see what generics are available.
Cacheable
Generic: This contract accepts one generic definition<TReturn>
, which is what the entity will return when callingget
to compute its value.SerializableCacheable
Generic: This contract accepts two generic definitions<TUnserialized, TSerialized>
TUnserialized
: The type that will be returned when we unserialize the cache value. It should be the same shape asTReturn
to ensure consistency.TSerialized
: the type that will be returned when we serialize the result.SupportsDefaultValue
Generic: This contract accepts one generic definition<TDefault>
, which is what the entity will return when missing the cache while using the AsyncCache utility. It should be the same shape asTReturn
to ensure consistency.
Example,
<?php
/**
* @phpstan-type ReturnStructure Collection<User>
* @implements Cacheable<ReturnStructure>
* @implements SerializableCacheable<ReturnStructure, int[]>
* @implements SupportsDefaultValue<ReturnStructure>
*/
class CourseQuery implements Cacheable, SerializableCacheable, SupportsDefaultValue
{}
Noteworthy Improvements
Now that we’ve rolled out our standardized caching solution, let’s see how it’s made a difference in practice.
Streamlined Developer Experience
With our standardized approach in place, developers no longer need to second-guess themselves when it comes to caching. The infrastructure provides a straightforward way to read from and write to the cache, making their lives easier.
Also, it promoted more visibility and knowledge about what you might and might not cache and possible ways to access it.
Turbocharging Performance with Async Cache
Beyond the initial goal of using async cache to avoid blocking requests, we took it a step further to supercharge the performance of some of our most sluggish queries. That is by using the Stale cache technique.
This is how it works: every time we compute a value, we stash it in a stale cache with an extended TTL. Then, we consistently access the cache asynchronously, paired with the SupportsDefaultValue
contract. In the event of a cache miss, we retrieve data from the stale cache (using getCacheMissValue
). Meanwhile, behind the scenes, an async job updates the cache and idle cache for the next request.
Thus, no user-facing request(s) will have to wait for the heavy computation to finish.
The result? A massive performance boost for some of our previously sluggish pages.
Note: Using this technique comes after making sure we’ve optimized our queries first. Sometimes, we have technical limitations or legacy structures that we need to live with for a while until we migrate to more optimized solutions like offloading the computation to another service. In the meantime, we can make it way faster for our end-users with a Stale cache.
Example,
<?php
use App\Models\Author;
use App\Models\Book;
use Illuminate\Database\Eloquent\Collection;
use StuDocu\CacheableEntities\Contracts\Cacheable;
use StuDocu\CacheableEntities\Contracts\SupportsDefaultValue;
/**
* @phpstan-type ReturnStructure Collection<int, Book>
*
* @implements Cacheable<ReturnStructure>
* @implements SupportsDefaultValue<ReturnStructure>
*/
class AuthorPopularBooksWithStaleCacheQuery implements Cacheable, SupportsDefaultValue
{
public const DEFAULT_LIMIT = 8;
public function __construct(
protected readonly Author $author,
protected readonly int $limit = self::DEFAULT_LIMIT,
) {
}
public function getCacheTTL(): int
{
return 3600 * 24;
}
public function getCacheKey(): string
{
return "authors:{$this->author->id}:books:popular.v1";
}
public function get(): Collection
{
$books = Book::query()
->join('book_popularity_scores', 'book_popularity_scores.book_id', '=', 'books.id')
->where('author_id', $this->author->id)
->whereValid()
->whereHas('ratings')
->orderByDesc('document_popularity_scores.score')
->take($this->limit)
->get();
$this->setRelations($books);
return tap(
$books,
function (Collection $results) {
cache()->put(
$this->getStaleCacheKey(),
$results,
$this->getStaleCacheTTL(),
);
},
);
}
private function getStaleCacheKey(): string
{
return $this->getCacheKey() . ':stale';
}
private function getStaleCacheTTL(): int
{
return $this->getCacheTTL() + (3600 * 24);
}
public function getCacheMissValue(): Collection
{
$books = cache()->get($this->getStaleCacheKey(), Collection::empty());
if (! ($books instanceof Collection) || $books->isEmpty()) {
// When we neither have the up-to-date results nor the stale results cached, we compute
// them synchronously as a last resort.
return $this->get();
// Or you can return an empty collection if you don't want to have a value every time.
}
return $books;
}
/**
* @param ReturnStructure $books
*/
private function setRelations(Collection $books): void
{
$books->each->setRelation('author', $this->author);
// Generally speaking, you can do eager loading and such in a similar fashion (for ::get and ::unserialize).
}
}
Some caveats of this approach:
- We are treating an otherwise blocking value as a non-blocking value which might not feel natural.
- On cache miss for both original and stale caches, we will compute the value on-the-fly which will be an exceptionally slow request (for that user).
Shrinking Cache Size with Serialization
Thanks to SerializableCacheable
, we achieved a swift and uncomplicated reduction in the cache footprint of our models. It allowed us to streamline the process, opting to cache only the essential IDs.
→ The result? Downsizing the largest cached key in Redis by 97.17%. Which gave space for more keys to be cached. We continue using the same strategy for other large keys.
Trimming down the size of cached data holds significant importance. Especially if you’re working with caching algorithms like LRU (Least Recently Used), having a smaller footprint proves advantageous. It safeguards against potential data loss in scenarios where storage overflows, preventing the inadvertent dropping of cached values before their time is up.
Conclusion
Sticking to internal standards becomes increasingly vital to maintain uniformity as teams expand. Caching and managing cache become particularly delicate when dealing with large-scale applications holding tens of millions of records, serving requests from all corners of the globe. However, it’s crucial to note that caching isn’t a magic fix for badly performant queries. Always prioritize improving the query itself before resorting to caching.
Thanks for taking the time to read. Don’t hesitate to raise issues or submit pull requests if you have suggestions or improvements aligning with the package’s goals.
Curious about the Query classes we touched upon? Dive deeper into their usage at Studocu: