Distributed Caching in Elixir using Nebulex

I remember around 2008 when I was submerged in the Java world I had a change to try awesome caching libraries and frameworks, such as: Hazelcast, EHCache, Spring Caching, etc. At that time, between researching and learning about distributed caching, one of the most interesting things I came across with, was a presentation from Cameron Purdy titled “Distributed Caching Essential Lessons”, it got my attention immediately, since it was one of the first materials talking about distributed caching topologies, taxonomies, pros/cons, etc. So I began to put all this in practice and there were tons of options to make it “reasonably easy to achieve”. In other words, I never felt like something was missing to do so.

Some of the best-known caching tools in the Java world I remember to have worked with.

Years later …

Three years later I came across with Erlang (and later on with Elixir). That was a big change for me, in the best sense. I had found a great programming language for the kind of things I used to build (distributed systems, fault-tolerant, highly scalable, available, reliable, etc.), and of course the distributed caching topic showed up again, since caching is one of the best-known and used techniques to improve performance and scalability.

I began to investigate some available cache options in the Erlang/Elixir world, and I found only a few (but good ones), such as: Epocxy Cache, Cachex and ConnCache. All of them were specific cache implementations (and mostly for for local caching). However I didn’t find a real caching framework, a cache abstraction layer integrating different caching options (local and distributed) which would enable developers to craft and deploy different caching topologies such as the ones described in “Distributed Caching Essential Lessons”.

That inspired me to write Nebulex, a real caching framework for Erlang/Elixir which aims to implement all those features just mentioned above.

Official Image for Hubble 25th Anniversary – Source: Mic/NASA

Now, let’s go over these concepts and nice features we would like to have in a caching framework and how to use them with Nebulex.

Cache Abstraction Layer

Nebulex provides support for transparently adding caching into an existing Elixir (or Erlang) application. Similar to Ecto, the caching abstraction allows consistent use of various caching solutions with minimal impact on the code.

Easy to add caching support to an existing Elixir app

In the same way we define a repo in Ecto we define our cache, like so:

defmodule MyApp.Cache do
use Nebulex.Cache,
otp_app: :my_app
adapter: Nebulex.Adapters.Local
end

As you may have noticed, we are using the Nebulex built-in local adapter in this example; we will talk a little bit more about adapters later.

We then have to add MyApp.Cache within the application’s supervision tree; normally located at lib/my_app/application.ex (or lib/my_app.ex for elixir versions < 1.4.0), inside the start/2 function:

def start(_type, _args) do
children = [
MyApp.Cache
]

...

That’s it, now we are ready to use our cache!

iex> MyApp.Cache.set "foo", "bar"
"bar"
iex> MyApp.Cache.get "foo"
"bar"

Agnostic to any particular caching solution

Nebulex has a flexible and pluggable architecture based on adapters (inspired by Ecto). If you want to use different caching solutions, it is only a matter of changing the adapter. Let’s go over the Nebulex built-in adapters briefly:

  • Nebulex.Adapters.Local — This adapter implements a generational cache and is built on top of Shards, an Erlang library which provides Sharding support for ETS tables. This adapter is the one we used in our first cache definition.
  • Nebulex.Adapters.Dist — This adapter runs on top of an existing local cache providing the missing distributed behavior. In this way, it is possible to reuse it in any other local adapter different than the built-in one.
  • Nebulex.Adapters.Multilevel — This is just a simple layer on top of existing caches, either local or distributed. It enables to setup multi-level cache hierarchy. Multi-level caches generally operate by checking the fastest, level 1 (L1) cache first; if it hits, the adapter proceeds at high speed. If that first cache misses, the next fastest cache (level 2, L2) is checked, and so on, before accessing external memory (most likely the Database).

Aside from these adapters, there are more, such as: Redis adapter, replicated cache adapter, etc.

The Nebulex roadmap includes adapters for: Mnesia, Memcached and Cachex. Check out the issue tracker for more info.

For example, suppose we want to use Redis as a caching solution, we only have to modify the MyApp.Cache module, like so:

defmodule MyApp.Cache do
use Nebulex.Cache,
otp_app: :my_app,
adapter: NebulexRedisAdapter
end
It is important and highly recommended to check out the adapter’s documentation you are going to use, so you can be aware about additional options, features, limitations, etc.

Distributed Caching Topologies

As of yet, I haven’t found any formal or standard documentation about caching topologies, each vendor provides its own documentation in its own way. That’s why I decided to take as base the “Distributed Caching Essential Lessons” presentation (by Cameron Purdy), and also the docs provided by Oracle Coherence — IMHO, they could be among the best docs around this topic. The main topologies described there are: Replicated Cache, Partitioned Cache and Near Cache — there are some variations of them, but let’s focus on these which are the most relevant.

Thanks to the set of adapters Nebulex brings with it, it is possible and very easy to craft and deploy all these mentioned topologies. Nevertheless, the ability and how easy is to do so, depends mostly on the adapter itself. For example, in case of the Redis adapter, since Redis supports local and distributed caching (via Redis Cluster), all these topologies are easily achievable as well.

Now let’s take a closer look at some of these topologies and how they could be implemented using Nebulex.

Partitioned Cache

  • Requirement: Extreme Scalability.
  • Solution: Shared-Nothing Architecture. Automatically Partition data across all cluster members.
  • Result: Linear Scalability. By partitioning the data evenly, the per-port throughput (the maximum amount of work that can be performed by each server) remains constant as servers are added, up to the capacity of the switch fabric.

Benefits

  • Partitioned — The size of the cache and the processing power available grow linearly with the size of the cluster.
  • Load-Balanced — The responsibility for managing the data is automatically load-balanced across the cluster.
  • Ownership — Exactly one node in the cluster is responsible for each piece of data in the cache.
  • Point-To-Point — The communication for the partitioned cache is all point-to-point, enabling linear scalability.
Partitioned Cache Reads
Partitioned Cache Writes

Partitioned Cache with Nebulex

For this example, we will work over the same cache we defined previously and we will use the built-in distributed adapter. Since the distributed adapter works on top of an existing local cache, we need to define two cache modules, one for the local cache and the other one for the distributed cache, and tell the distributed cache which cache to use locally. Something like this:

defmodule MyApp.Cache do
use Nebulex.Cache,
otp_app: :my_app,
adapter: Nebulex.Adapters.Dist
  defmodule Local do
use Nebulex.Cache,
otp_app: :my_app,
adapter: Nebulex.Adapters.Local
end
end

We also have to add some config to our config.exs file:

use Mix.Config
# Distributed Cache
config :my_app, MyApp.Cache,
local: MyApp.Cache.Local
# Internal local cache used by MyApp.Cache
config :my_app, MyApp.Cache.Local,
gc_interval: 86_400

The important bit here is the line where we tell the distributed cache which cache to use as local (local: MyApp.Cache.Local).

Finally, within our application module, inside the start/2 function:

def start(_type, _args) do
children = [
MyApp.Cache.Local,
MyApp.Cache
]
  ...
...

That’s pretty much it, now we can setup a cluster of Elixir nodes and test our cache out, super easy isn’t?

Check out the full partitioned cache example HERE.

Near Cache

  • Requirement: Extreme Performance. Extreme Scalability.
  • Solution: Local “L1” In-Memory Cache in front of a Clustered “L2” Partitioned Cache.
  • Result: Zero Latency Access to recently-used and frequently-used data. Scalable cache capacity and throughput, with a fixed cost for worst-case. A Near Cache provides local cache access to recently and/or often-used data, backed by a centralized or multi-tiered cache that is used to load-on-demand for local cache misses. The result is a tunable balance between the preservation of local memory resources and the performance benefits of truly local caches.

Reads

Multi-level caches generally operate by checking the fastest, level 1 (L1) cache first (local cache), if it hits, the adapter proceeds at high speed. If that first cache misses, the next fastest cache (L2, maybe distributed cache) is checked, and so on, before accessing external storage, maybe the Database. The Database may serve also as backup, hence, in the case the data in the cache becomes unavailable it is recovered from DB on-demand.

Near Cache Reads

Writes

For write functions, the “Write Through” policy is applied by default, this policy ensures that the data is stored safely as it is written throughout the hierarchy; it might be possible to force the write operation in a specific level (this depends on the cache options).

Near Cache Writes

Near Cache with Nebulex

As you can see, the Near Cache is composed by N levels of caches. For that reason, we will use the built-in multi-level adapter.

Let’s turn our previous cache into a Near Cache, like so:

defmodule MyApp.Cache do
use Nebulex.Cache,
otp_app: :my_app,
adapter: Nebulex.Adapters.Multilevel
  defmodule L1 do
use Nebulex.Cache,
otp_app: :my_app,
adapter: Nebulex.Adapters.Local
end
  defmodule L2 do
use Nebulex.Cache,
otp_app: :my_app,
adapter: Nebulex.Adapters.Dist
    defmodule Primary do
use Nebulex.Cache,
otp_app: :my_app,
adapter: Nebulex.Adapters.Local
end
end
end

We defined MyApp.Cache as a multi-level cache and two nested modules: MyApp.Cache.L1 as local cache, and MyApp.Cache.L2 as distributed cache. Within MyApp.Cache.L2 there is other nested module MyApp.Cache.L2.Primary, which is the local in-memory storage for the distributed cache. These modules represent our multi-level hierarchy.

BTW, you can define each module separately, instead of nested.

In the config.exs file:

use Mix.Config
# Multilevel Cache – wrapper for L1 and L2 caches
config :my_app, MyApp.Cache,
cache_model: :inclusive,
levels: [MyApp.Cache.L1, MyApp.Cache.L2]
# L1 Cache
config :my_app, MyApp.Cache.L1,
gc_interval: 86_400
# L2 Cache
config :my_app, MyApp.Cache.L2,
local: MyApp.Cache.L2.Primary
# Internal local cache used by MyApp.Cache.L2
config :my_app, MyApp.Cache.L2.Primary,
gc_interval: 86_400

Notice how we tell the multi-level cache MyApp.Cache what levels to use by means of the :levels option — levels: [MyApp.Cache.L1, MyApp.Cache.L2].

And finally, within our application module, inside the start/2 function:

def start(_type, _args) do
children = [
MyApp.Cache.L2.Primary,
MyApp.Cache.L2,
MyApp.Cache.L1,
MyApp.Cache
]
  ...
Check out the full near cache example HERE.

Failover for Partitioned and Near Cache

Failover has to be implemented on top of Nebulex according to our needs. For instance, a simple way to support failover and avoid data vulnerabilities is implementing the Read-Through pattern. Since we are backing up the data in the database, it doesn’t matter if a cache node crashes or dies, once we spin up the node again the data will be loaded from database on-demand (by means of Read-Through pattern).

But let’s review some of these cache usage patterns and how they can be implemented using Nebulex.

Cache Usage Patterns via Nebulex

There are several common access patterns when using a cache, such as: Read-Through, Write-Through, Cache-as-SoR, etc. These patterns can be easily implemented in different ways with Nebulex:

Via Nebulex Hooks

When we define a cache, it implements the Nebulex.Hook behavior under the hood by providing a default implementation (do nothing). The Nebulex.Hook behavior defines two callbacks: post_hooks/0 and pre_hooks/0 (both are overridable). These functions tell Nebulex which function pipelines must be evaluated after and/or before the command execution.

Let’s implement the Read-Through pattern using a simple post hook.

defmodule MyApp.Cache do
use Nebulex.Cache,
otp_app: :my_app
adapter: Nebulex.Adapters.Local
  alias MyApp.Repo
require Logger
  def post_hooks do
{:pipe, [&repo_hook/2]}
end
  def repo_hook(nil, {_, :get, [{schema, id} = key, _]}) do
Logger.debug("Get key #{inspect(key)} from database")
    schema
|> Repo.get(id)
|> case do
nil -> nil
res -> set(key, res)
end
end
  def repo_hook(result, _), do: result
end

Note that we can implement the Write-Through pattern in the same way.

Check out the ecto_fallback example to learn more about it.

Multi-level adapter with fallback

Based on the previous example, the Read-Through pattern can be achieved just passing the :fallback option when we perform a read operation — see Nebulex.Adapters.Multilevel. The fallback function acts as L3 and loads the data directly from a Database or any other data store. Therefore, we avoid data vulnerabilities.

Via NebulexEcto (Cache-as-SoR)

NebulexEcto is a wrapper on top of Ecto and Nebulex which implements Cache-as-SoR pattern out-of-box. If we are planning to use Ecto, This might be the simplest option to achieve these patterns.

Supposing we are using Ecto, we should have a repo, like so:

defmodule MyApp.Repo do
use Ecto.Repo,
otp_app: :my_app,
adapter: Ecto.Adapters.Postgres
end

Assuming we are going to use our previous local cache, we just need to define our Cacheable Repo:

defmodule MyApp.CacheableRepo do
use NebulexEcto.Repo,
cache: MyApp.Cache,
repo: MyApp.Repo
end

That’s all, now we can start using our CacheableRepo:

MyApp.CacheableRepo.insert(schema)
MyApp.CacheableRepo.get!(MyApp.MySchema, schema.id)
MyApp.CacheableRepo.get_by!(MyApp.MySchema, [id: schema.id])
MyApp.CacheableRepo.update(changeset)
MyApp.CacheableRepo.delete(schema)

Every time we read data using CacheableRepo (via get or get_by), the Read-Through pattern is applied under the hood doing the work for us. In the same way Write-Through pattern is performed every time we write with CacheableRepo (via insert, update or delete).

Check out nebulex_ecto_example.

Wrapping Up

With Nebulex we don’t have only a particular cache implementation, but a real caching framework that allows us to implement different patterns and topologies, agnostic to any particular cache solution, several adapters included (and more on the way), and most importantly, very easy to use.

Visit Nebulex GitHub to learn more about it, you will find several interesting links including examples, guides, related projects, etc.

Next time we will cover introspection stats and troubleshooting!