Elixir Agent vs GenServer

(and also, Agent.cast/2 vs Agent.update/3)

Published in

Scientific breakthrough of the afternoon

4 min readOct 8, 2018

`Agent vs GenServer`

GenServer is a behaviour to implement generic servers, but most of the time it is only used as a process wrapper around some state. This is where the Agent module comes in: it hides the boilerplate and makes it easy to access data in a separate process. Agent is only a convenience wrapper around GenServer as it is implemented with it. See the source:

agent.ex (client functions)
server.ex ( GenServer callbacks)

Example use case

One scenario is when multiple admins try to add the same user/client/etc. at (almost) the same time. (Seems improbable? Imagine working for a large company using a ticketing system. The easy tasks are gone fast, not all systems use locks. Or people just forget.)

An Agent process could run with a simple claim function that checks whether username already exists in its list, and if not, it saves it and lets the user creation to go forth. Will throw an error (or update the site, etc.) otherwise.

(A simple database check would suffice in most cases of course, but if using CQRS/ES and the user creation requests are issued before the read store could update, then this extra check if needed.)

Why not use a state machine?

Because the term “state” in (finite) state machines refers to the state of an entire (sub)system, and not just to a collection of data.

Some good reads:

Rage Against The Finite-State Machines
State Machine in Elixir with Machinery
ericentin/gen_state_machine (Elixir wrapper around gen_statem)
gen_statem Erlang behaviour

(Also including: Event Machines github gist. Not strictly related, but it seems to be a good way to ensure valid aggregate state in a CQRS/ES with a state machine. That is, having a means to disallow commands that would lead to an inconsistent state otherwise. We’ll see.)

`Agent: cast/2` vs `update/3`

According to the documentation, bothAgent.cast/2 and Agent.update/3 return :ok , but the docs do not spell out explicitly that cast/2 is asynchronous and the update/3 is synchronous in its operations. (Looking at the source above shows clearly though that the former is implemented using GenServer.cast/2 and the latter with GenServer.call/3 .)

Why did the final case (i.e., calling both `cast/2` and `update/3`) time out (and try to crash the caller process)?

The GenServer.call/3 documentation has a section on timeout:

timeout is an integer greater than zero which specifies how many milliseconds to wait for a reply, or the atom :infinity to wait indefinitely. The default value is 5000. If no reply is received within the specified time, the function call fails and the caller exits. If the caller catches the failure and continues running, and the server is just late with the reply, it may arrive at any time later into the caller’s message queue. The caller must in this case be prepared for this and discard any such garbage messages that are two-element tuples with a reference as the first element.

Calling cast/2 is “fire and forget”: it only asks the Agent process to execute the given commands, but we don’t care about the results. (Not that there would be any as both cast/2 and update/3 returns :ok only and Agent's purpose it to maintain an internal state.) At this point, the Agent server started sleeping for 3 seconds.

(In server.ex:handle_cast/2. There is no timeout using cast/2 and keep in mind that requests are handled sequentially in processes! Try it out with f.(:cast, 12000); f.(:cast, 12000) and wait 24 seconds for both to finish, but of course, the console will be available in the meantime.)

update/3 is called immediately after cast/2, and it will block the console until it receives confirmation (i.e., an :ok message) that the computation is complete. The default timeout value is 5000 ms (= 5 seconds), and the cast/2 is already consuming its 3 seconds, so this will definitely time out.

Clean up stray messages after timeout

As the quote from the documentation states above, “If the caller catches the failure and continues running, and the server is just late with the reply, it may arrive at any time later into the caller’s message queue. The caller must in this case be prepared for this and discard any such garbage messages that are two-element tuples with a reference as the first element.”

iex(4)> f.(:update, 7000)
** (exit) exited in: ...
|#PID<0.104.0>| update finishediex(4)> :erlang.process_info(self(), :messages)
{:messages, [{#Reference<0.434605023.2143289346.233483>, :ok}]}

That highlighted 2-tuple above is the message the docs are talking about.

If you use Observer (:observer.start()) to monitor process mailboxes, be aware that sometimes you will see extraneous messages ending with :get_status atoms. This is a side effect of using Observer.

Refresher: the handle_call callback

update/3 starts out in agent.ex , calling GenServer.call/3 , that in turn calls :gen.call/4.

From here it becomes fuzzy for me. I think this line in gen.erl sends a message to the server process, that handles it via handle_msg clauses in gen_server.erl. The try_handle_call/4 will call agent/server.ex's handle_call/3 callback implementation at one point. Then based on the result (via handle_msg in gen_server.erl here), we get back to :gen.call/4' s receive loop, that returns {:ok, res}, or in this case, {:ok, :ok} , resulting in :ok.

(Note to self: re-read Learn You Some Erlang, especially the section “What is OTP?”.)

Elixir Agent vs GenServer

(and also, Agent.cast/2 vs Agent.update/3)

Agent vs GenServer