Elixir: a few things about GenStage I wish I knew some time ago

Andrey Chernykh
Jun 14, 2018 · 5 min read

Today I want to talk about GenStage. Although this behaviour was introduced a while ago in Elixir ecosystem not every developer has tried it out. And there is a bunch of questions that you’ll probably face while dealing with GenStage for the first time.

Image for post
Image for post
Stages are data-exchange steps that send and/or receive data from other stages. When a stage sends data, it acts as a producer. When it receives data, it acts as a consumer.

I don’t want to duplicate all the information available in the GenStage’s documentation. I’d rather to make an emphasis on some key aspects I’ve faced on my own during GenStage implementation and usage in production.

Two usage approaches

You own events source (demand handling)

An events source might be a database or third-party application, API, etc. Producer is going to fetch events only in order to cover an incoming demand from its consumers.

Image for post
Image for post

Example: you need to import data from another application and this application exposes an API which allows you to fetch data by batches or pages.

Once your Consumers handled a batch they start asking for more and the producers makes a call to that external API in order to fetch exact amount of data and emit events, satisfy the Consumers demand.

If your GenStages where designed properly there is no chance for a Producer to be overloaded by data.

In this case GenStages are implemented for parallel or even distributed data handling.

You don’t own events source (events pushing)

Producer is going to receive various and unpredictable amount of events from time to time or serve infinite data stream.

Image for post
Image for post

Example: you need to send push notifications depending on some events from your application (received from RabbitMQ for example). You can not predict the amount of push notifications needed to be sent. At some moment you should to handle only 10 requests for pushes but in a next few minutes this amount can turn into 10000 (marketing team did their job great and users started to use your application like crazy).

In this case Consumers wait for work and Producer itself ignores Consumers demand, receives incoming external events and dispatches them to the Consumers while trying not to die under the pressure.

So GenStages should be implemented for backpreassure purpose mainly.


Now when I introduced two main GenStage’s usage approaches, let’s take a closer look at each of them and things you should be aware of.

Events pushing

Here is a Producer and a Consumer examples:

Under the hood, actually, the Consumer sends a demand to the Producer, but the Producer ignores it end replies with no events. “No events” response is then successfully ignored by the Consumer. So both the Consumer and the Producer await for an incoming event constantly. None of them initiate work to do. Work comes from outside of GenStages.

Demand handling

Let’s start with a code:

There are Producer’s and Consumer’s code above. The Producer should take care of a demand: store Consumer’s demand that wasn’t satisfied by this Producer and at the same time store events that were odd on previous demand handling. Why?

One thing you should keep in mind working with a Consumer’s min. demand: it doesn’t define the lower limit when a Consumer will go for more events. Actually, a Consumer asks for more job when it has max_demand — min_demand events in its queue. For example above it equals to 3.

  1. A Consumer initially asks for 4 events
  2. Receives only 2
  3. Awaits for 1 more (and while it waits for this event it does nothing)
  4. Receives the required 1 event
  5. Starts to handle those 3 events
  6. Asks for more events then

So if you have a requirement to handle an event immediately this could be a surprise for you that a Consumer doesn’t serve events when they come, but awaits for more to satisfy this formula.

This is why you need to keep demand in a Producer — to let it know that not all demand was covered and some Consumers still waiting for a work (not asking for events explicitly). And may be you should go and try to fetch more events on timeout/schedule or just try to fetch them constantly in a recursion.

If you forget about it and your Producer listens only for incoming demand you will get hanging events in one of your Consumers at some moment.

A few tips

How to start Consumers dynamically?

Here a simple example code shows how to spawn Consumers dynamically from a Producer.

Consumers Supervisor:

Producer:

What are Consumer’s subscription approaches?

Learn more about GenStage.Dispatcher behaviour

You can dramatically change the way how your GenStages behave with a right Dispatcher usage.

During an initialisation you can apply one of three predefined Dispatchers to a Producer:

  • GenStage.DemandDispatcher - dispatches the given batch of events to the consumer with the biggest demand in a FIFO ordering. This is the default dispatcher.
  • GenStage.BroadcastDispatcher - dispatches all events to all consumers. The demand is only sent upstream once all consumers ask for data.
  • GenStage.PartitionDispatcher - dispatches all events to a fixed amount of consumers that works as partitions according to a hash function.

…or create one on your own implementing the behaviour.

Thank you for your time. Happy coding!

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch

Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore

Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store