Re-design with Elixir/OTP and Pattern Matching

Mustafa Turan
ElixirLabs
Published in
7 min readSep 7, 2016

--

Software design is a difficult job and design should change according to the language to get the benefits of the language. Elixir is a fast growing elegant functional language on top of Erlang VM. One of the most important features of Elixir is the ability use OpenTelecomPlatform(OTP) which allows supervised supervisor/workers to run concurrent on the same node and also across multiple machines. And secondly, it allows the use of pattern matching to keep code clear and more readable.

To get the benefit of these great features, the first rule is to stick with single responsibility principle. Single responsibility principle is one of the core things in software development for reusability, extensibility and mobility. Try to modularize the components as much as you can, that will help to generate new workers/supervisors.

And secondly, use the ‘if-less’ approach with pattern matching, you will see that the code will never need even one ‘if’ statement with pattern matching.

Finally, supervising the core modules into separate supervisors/workers. This great feature comes by default from Erlang VM.

Lets implement a basic on demand image thumbnailing microservice using a HTTP endpoint resource with/without OTP, but using single responsibility principle. I preferred to start with interface implementation to make the modules replaceable when necessary. So, we can easily replace AWS S3 with GoogleCloud or Memcache with Redis and so on… By the way, I love Italian food so “le Mani in codice!”

Routes and flow:

# Routes for server: 
get /:id/:widthx:height -> /123123-1231-3-13-12/64x64
get /:id/:widthx -> /123123-1231-3-13-12/64x
get /:id/x:height -> /123123-1231-3-13-12/x64
# Request/Response Flow:
# Client -> Server -> CacheInterface Impl -> StorageInterface Impl -> ImageInterface Impl -> Client

Dependencies: Cowboy(web server), Poolboy(pooling), Plug(routing and http handling), ExMagick(thumbnail), Memcache(caching)

Note: Web server and routes will be the common parts for both of the designs.

The Interfaces:

defmodule ImgOut.CacheInterface do
@callback read(charlist) :: {:ok, binary} | charlist
@callback write(charlist, binary) :: {:ok, binary}
end
defmodule ImgOut.ImageInterface do
@callback thumb({:error, integer, map}, any) :: {:error, integer, map}
@callback thumb(binary, map) :: {:ok, binary} | {:error, integer, map}
end
defmodule ImgOut.StorageInterface do
@callback read({:ok, binary}) :: {:ok, binary}
@callback read({:error, integer, map}) :: {:error, integer, map}
@callback read(charlist) :: {:ok, binary} | {:error, integer, map}
end

Implementing modules without OTP and without pooling support:

All that is required is to create methods for the behaviours above. Finally, you can call on your web route `thumb` function with id and `opts = %{width: 256, height: 256}`:

# As func
def thumb(id, opts)
id
|> StorageService.read
|> ImageService.thumb(opts)
end

Directory structure will be:

interfaces
-> cache_interface.ex
-> image_interface.ex
-> storage_interface.ex
services
-> cache_service.ex
-> image_service.ex
-> storage_service.ex

Sample code for cache_service.ex:

defmodule ImgOut.CacheService do
@behaviour ImgOut.CacheInterface def read(key) do
response = Memcache.Client.get(key)
case response.status do
:ok -> {:ok, response.value}
_ -> key
end
end
def write(key, val) do
Memcache.Client.set(key, val)
{:ok, val}
end
end

Benefits of implementing modules without OTP and without pooling support:

  • No knowledge required, anybody can do it if they know the flow of orientation. You can apply the approach to any language.

OTP Basics

GenServer: Client/Server behaviours

Generic Finite State Machine: Finite state machine programming (only using Erlang)

Generic Event Handler: Event drive programming like logging, stats, metrics, web sockets.

Supervisor: Fault tolerant trees

Application: Encapsulate resources and functionality

Supervising Strategies

Elixir/Erlang provides fault tolerance tree as a behaviour with 4 different strategies. Take a look to the visualisation of supervision strategies: https://medium.com/@mustafaturan/visualisation-of-elixir-supervision-tree-strategies-4d4cb8123138

Implementing modules with OTP and pooling support:

Now, we have a general idea of what specific operation OTP actor is used for specific operation. We will create workers for each service using GenServer. And supervise each of them by creating supervisor module. Then supervise all of them under the application supervisor.

Directory structure will be:

interfaces
-> cache_interface.ex
-> image_interface.ex
-> storage_interface.ex
services
-> cache_service.ex
-> image_service.ex
-> storage_service.ex
supervisors
-> cache_supervisor.ex
-> image_supervisor.ex
-> storage_supervisor.ex
workers
-> cache_worker.ex
-> image_worker.ex
-> storage_worker.ex

Sample code for cache_worker.ex:

defmodule ImgOut.CacheWorker do
use GenServer
## Public api @doc """
Read data from cache.
"""
def read(key),
do: GenServer.call(__MODULE__, {:read, key})
@doc """
Write data to cache.
"""
def write(key, val) do
GenServer.cast(__MODULE__, {:write, key, val})
{:ok, val}
end
## Callbacks @doc false
def start_link,
do: GenServer.start_link(__MODULE__, [], name: __MODULE__)
@doc false
def init(_opts),
do: {:ok, []}
@doc false
def handle_call({:read, key}, _from, state),
do: {:reply, ImgOut.CacheService.read(key), state}
@doc false
def handle_cast({:write, key, val}, state) do
ImgOut.CacheService.write(key, val)
{:noreply, state}
end
end

I use two main calls in here GenServer.call and GenServer.cast. The `call` function is used if we need the reply or syncronous operations. And the oposite `cast` func is used when we do not need the reply and asyncronous.

Also it is good to limit resource usage by image processor, so lets use poolboy to create a configured pool for image_worker:

defmodule ImgOut.ImageWorker do
use GenServer
@timeout Application.get_env(:imgout, :gm_timeout)## Public api @doc """
Generate thumbs using ImgOut.ImageService thumb
"""
def thumb({:ok, img}, dimensions) do
:poolboy.transaction(:gm_worker_pool, fn(worker) ->
GenServer.call(worker, {:thumb, {:ok, img}, dimensions}, @timeout)
end)
end
def thumb({:error, status, reason}, _),
do: {:error, status, reason}
## Callbacks @doc false
def start_link(_opts),
do: GenServer.start_link(__MODULE__, :ok, [])
@doc false
def init(_opts) do
{:ok, []}
end
@doc false
def handle_call({:thumb, {:ok, img}, dimensions}, _from, state) do
data = ImgOut.ImageService.thumb({:ok, img}, dimensions)
{:reply, data, state}
end
end

One last thing is to implement the supervisors to supervise workers. Sample code for cache_supervisor.ex:

defmodule ImgOut.CacheSupervisor do
use Supervisor
@doc false
def start_link,
do: Supervisor.start_link(__MODULE__, [], name: __MODULE__)
@doc false
def init([]) do
children = [
worker(ImgOut.CacheWorker, [])
]
opts = [strategy: :one_for_one, name: __MODULE__] supervise(children, opts)
end
end

And sample supervisor code with pooling for image_supervisor.ex:

defmodule ImgOut.ImageSupervisor do
use Supervisor
@pool_size Application.get_env(:imgout, :gm_pool_size)
@pool_max_overflow Application.get_env(:imgout, :gm_pool_max_overflow)
@doc false
def start_link,
do: Supervisor.start_link(__MODULE__, [], name: __MODULE__)
@doc false
def init([]) do
worker_pool_options = [
name: {:local, :gm_worker_pool},
worker_module: ImgOut.ImageWorker,
size:
@pool_size,
max_overflow:
@pool_max_overflow
]
children = [
:poolboy.child_spec(:gm_worker_pool, worker_pool_options, [])
]
opts = [strategy: :one_for_one, name: __MODULE__] supervise(children, opts)
end
end

Then we will call the workers instead of services from the server:

# As func
def thumb(id, opts)
id
|> StorageWorker.read
|> ImageWorker.thumb(opts)
end

Benefits of implementing modules with OTP and pooling support:

  • Easy to determine bottlenecks,
  • Trace supervisors, workers and their resource usages,
  • Move any supervisor or child into different machines any time and continue to same flow without interruption,
  • Pooling for image service allows predetermined resource allocation for the image processing operations.
ImgOut Supervision Tree

Do I need to spawn more worker for StorageWorker and CacheWorker?

I used named GenServer with ‘module name(__MODULE__)’ which blocks us creating new workers more than once for StorageWorker and CacheWorker (which may cause timeouts if worker is busy at that time). So, it is better to create more workers. We can do this by spawning more processes. But we have to also track these names and need to know if a worker is busy or not… Its complicated but there are a few ways to handle this complexity.

ImgOut Supervision Tree after Pooling Cache&Storage workers

IF-LESS Approach:

Forget “if statements” use pattern matching because there is no return statement in Elixir. It is time to replace your old early returns with pattern matching. In this approach, tried to use tuples to implement pattern matching across modules and functions. To sum up, pattern matching is one of the key features of Elixir which makes code clear and reusable.

Lets parse dimensions on server worker and pass to the other workers to thumb generation:

  get "/:id/:dimensions" do
{:ok, binary, content_type} = thumb(id, dimensions)
conn |> put_resp_content_type(content_type, nil) |> send_resp(200, binary)
end
defp thumb(_, "x"),
do: {:error, 422, %{dimensions: "Invalid dimension format. Tip: {w}x{h}"}}
defp thumb(id, %{"width" => "", "height" => height}),
do: thumb(id, %{height: height})
defp thumb(id, %{"width" => width, "height" => ""}),
do: thumb(id, %{width: width})
defp thumb(id, %{"width" => width, "height" => height}),
do: thumb(id, %{width: width, height: height})
defp thumb(id, %{} = dimensions) do
id
|> StorageWorker.read
|> ImageWorker.thumb(dimensions)
end
defp thumb(id, dimensions) do
thumb(id,
Regex.named_captures(~r/(^(?<width>\d*)x(?<height>\d*)$)/, dimensions))
end

Why I preferred NIF based graphsmagick library over system calls to the graphsmagick commands?

NIF package allows me to spawn process into EVM instead of an Operating System(OS) process. So, I will not use excessive resource like OS; instead I keep my pools ready to process image whenever other processes needs. (Thanks to exmagick library)

Why I did not add cache for thumbnails?

I prefer caching thumbs by using HTTP headers on the assumption that I will use a CDN that understand ETAG and Expires-in headers to cache my static(dynamic but fake static) content.

Why I added cache for original images?

I do not want to hit the remote server every-time when I need the image, because it is expensive both in operation and time.

Demo: Feel free to deploy to heroku using “Heroku Deploy” button at github source.

Source code (master branch): https://github.com/mustafaturan/imgout

Source code (no pool for cache&storage workers branch): https://github.com/mustafaturan/imgout/tree/no_pool_for_cache_and_storage_workers

Source code (call services directly from server branch): https://github.com/mustafaturan/imgout/tree/call_services_directly_from_server

Some load tests: http://bit.ly/2bYRnpp (3260 thumbnail generation per minute on Heroku free dyno) Each git branch has its own ‘Deploy to Heroku’ button, feel free to deploy each of them and see the results. The load test metrics are valid for master branch.

Slides:

Final notes:

I have tried to explain and demonstrate “the world of Elixir” with a simple microservice approach. I hope this article helps people who would like to start learning Elixir and implement things with the Elixir way.

Do not hesitate to join me, comment and please recommend if you like the article…

--

--

Mustafa Turan
ElixirLabs

Go, Elixir, Ruby, Software Architecture, Microservices