
What is wrong with gen_event?
gen_event confused me from the beginning, so I wanted to investigate the topic more deeply. I did that here. Then I left that topic, and it returned recently to me when I was wondering how the situation changed since then. Here is the updated version of the initial investigation, which started with the following statement:
I never used a gen_event, I think it is a bad pattern.
At first, it may look like a controversial statement, but I heard a lot of those complaints from other people. Initially, I heard that exact statement during the presentation made by Garrett Smith about pattern language — someone asked about that behavior at the end. Moreover, I heard similar thing in José Valim’s presentation about Elixir future while ago, when he introduced GenStage and GenRouter idea for the first time.
However, before we dive into reasons and explanations, let’s recall what the purpose of this behavior is.
What is gen_event?
OTP introduces two different terms regarding that behavior — an event manager and event handler modules.
The responsibility of event manager is being a named object which can receive events. An event can be, for example, an error, an alarm, or some information that we log. Inside manager, we can have 0, 1 or more event handlers installed. The responsibility of the handler is to process an event.
When the event manager is notified about an event, all handlers process it. The easiest way to imagine that is to think about the manager as a sink for incoming messages and handlers as different implementations which are writing messages to disk, database or terminal.
Another example is in my implementation of Francesco Cesarini’s assignment called Wolves, Rabbits and Carrots simulation. The primary purpose of that assignment is to introduce yourself to the concurrency, but internally it is a simulation. Inside, multiple events are happening at the same, and rest of entities receive notification.
In that case simulation_event_stream is an event manager:
One of the handlers implementation — simulation_cli_handler - is related with writing messages to the console. It is the actual gen_event callback module, so all handlers are implementations of that abstraction:
Moreover, the essential part regarding the complaints mentioned above is that: when starting event manager, we spawn it as a process and each event handler is implemented as a callback module. It means that processing logic executes in the same manager process.
Why it is problematic?
That causes two most significant issues — handlers are not concurrent and, they are not isolated from each other — at least in the notion of a process. Unfortunately, there is more — we heard explicitly that I never used gen_event, I think it is a bad pattern, and whole argumentation about that can be summed by:
- That behavior mentioned above it is not used anywhere besides
error_handlerand alerts mechanism in OTP. - It causes problems with supervision (because of not so natural approach for Erlang about combining manager and handlers in one process).
- It is tricky to use it in a fault-tolerant way (as above — all handlers are bound together in the single process).
- It is tricky to manage state in manager, it may be tempting to use, e.g., process dictionary, but you should push it down to handlers (which is not apparent on the first sight).
So let’s analyze the causes of each complaint separately.
Not widely used in the erts and OTP
It is the same process for all handlers
To dispatch an event to the manager, you can use one of two gen_event functions - notify and sync_notify. With first you can dispatch event as quickly as possible, but you have no backpressure applied, and you can end up in the situation when events are incoming at high-speed, but processing is slower. That cause process queue to grow and eventually, it can cause even a crash. It does not also check the manager presence so that you can throw messages to the void. On the other hand - synchronous dispatch waits for event processing by all handlers, which can be slow and eventually become a system bottleneck.
This problem is also very nicely described in the Nick DeMonner talk from this year ElixirConf US conference — check this out if you are interested. Elixir GenEvent implementation also has the third function - ack_notify which acknowledges the incoming messages, and it is something softer than sync_notify, but still asynchronous when it comes to processing.
It is hard to supervise
This behavior hides the complexity underneath, and it has perfect assumptions regarding that model of dispatching (if we separate handlers from manager, reliable dispatch is much harder to achieve, e.g., when it comes to fault tolerance), but it is merely counterintuitive when it comes to the Erlang philosophy, especially for the newcomers.
One thing worth mentioning is that you can work around that. One example could be by doing the synchronous call to a different process (in particular — even to another gen_server) inside gen_event.
Those solutions are feasible, but complicate the implementation and do not provide an easy way of handling failures. What happens if our process is overloaded by messages and the synchronous call times out? Replacing it the asynchronous call does not help either.
Failure handling
It may sound strange at the beginning, but it removes faulty event handler silently. It does produce an error report printed on the terminal, but nothing more. Moreover, well-known monitoring techniques, such as link or monitors cannot be used with the event handler module, because it is not a process. Moreover, a faulty event handler code does not crash the manager.
We can use different facility exposed by gen_event called add_sup_handler. It means that there is supervision for connection between the process that wants to dispatch an event and the handler. What does it mean? If we delete the event handler due to a fault, the manager sends a message {gen_event_EXIT, Handler, Reason} to the caller. It means that we need to provide an additional process, often called a guardian for the possibly faulty handler. Then, we dispatch the event through that guardian process, and when it receives the failure message (via handle_info) we can act accordingly to the requirements.
Keep in mind that underneath it uses links, not monitors — event handler chapter from Learn You Some Erlang For Great Good! has an excellent explanation why it may be dangerous and what issues it causes. Long story short, after using add_sup_handler you need to be cautious when it comes to the event manager shutdown.
What is interesting, Elixir’s version of that behavior solved that problem in the past by exposing add_mon_handler/3, which used monitor under the hood. Still, both solutions have another problem - it does not deliver {gen_event_EXIT, Handler, Reason} message when the manager process crash. You need to prepare for this another edge case - you either need to monitor manager or link it and trap exit signals in all handlers.
Moreover, it does not expose it anymore and whole behavior is deprecated, so you should use either directly and Erlang one or one of the alternatives.
State management
Alternatives?
If you are interested, Elixir has GenStage behavior which solves that case. José explained that idea many times, and official documentation suggests that path as well. Official blog post with announcement contains a perfect explanation how you can use GenStage instead of GenEvent.
Another solution mentioned there, is replacing GenEvent with Supervisor and GenServer - this article presents the concept clearly. However, do not use it blindly - at the end of the post, there is a list of drawbacks. One of them is that it still does not provide any mechanism for backpressure, so your mileage may vary. 😉
Summary
It is a typical example of caveat emptor — let the buyer beware.
Credits
- Original article posted in 2015.
- Erlang Documentation — Event handling principles.
- Erlang Documentation —
gen_event. - Chapter about “Event Handlers” inside “Learn You Some Erlang for Great Good” book.
- Erlang Factory SF 2015 — The Timeless Way of Building Erlang Apps.
- ElixirConf 2015 — Keynote by José Valim.
- LambdaDays 2017 — GenStage and Flow by José Valim.
- Replacing GenEvent by a Supervisor + GenServer.
Originally published at https://pattern-match.com on August 31, 2018.
