Wrapping Up

Concurrent Data Processing in Elixir — by Svilen Gospodinov (41 / 49)

The Pragmatic Programmers
The Pragmatic Programmers

--

👈 Adding Flow to a GenStage Pipeline | TOC | Chapter 5 Data-Ingestion Pipelines with Broadway 👉

When it comes to data-processing operations like map, filter, and reduce, there is no easier way to take advantage of GenStage than using Flow. At the same time, ease-of-use does not come at the expense of versatility. Most functions give you the option to configure the level of concurrency, events demand, and much more.

You may be tempted to replace all Enum and Stream module usage in your code with Flow. This is not a good idea. You get the best results from Flow only when you use it with large data sets or to perform hardware-intensive work. As you know, under the hood, Flow creates and manages a number of stage processes. While processes are lightweight, they are still an overhead when dealing with small tasks, which can be processed faster synchronously.

However, we covered a range of great use cases for Flow in this chapter. You optimized a CSV parsing function, first using the Stream module, and then using Flow to perform the data transformation concurrently. Then you saw how to partition data for reducer operations like reduce and group_by. We touched upon working with slow-running flows, where the same approach can be applied for unbounded streams of data. Finally, you integrated Flow into the GenStage data-processing pipeline in the scraper project, reducing the amount of code needed to…

--

--

The Pragmatic Programmers
The Pragmatic Programmers

We create timely, practical books and learning resources on classic and cutting-edge topics to help you practice your craft and accelerate your career.