Introduction to Akka Streams

If you’re new to the world of Akka, I recommend reading the first part of this series, Introduction to Akka Actors, before continuing. The rest of this article assumes some familiarity with the content outlined in that post, as well as a high-level understanding of Akka.

Why Akka Streams?

If you’re new to the world of stream processing, I recommend reading this blog A Journey into Reactive Streams

First and foremost — why use Streams? What kind of advantage do they give us over the standard ways (e.g. callbacks) of handling data.

The answer is simple — it abstracts away from the imperative nature of how the data is inputted into the application giving us a declarative way of describing, handling it and hiding details that we don’t care about. Streaming helps you ingest, process, analyze, and store data in a quick and responsive manner.

Actors can be seen as dealing with streams as well: they send and receive series of messages in order to transfer knowledge (or data) from one place to another. It is tedious and error-prone to implement all the proper measures in order to achieve stable streaming between actors, since in addition to sending and receiving we also need to take care to not overflow any buffers or mailboxes in the process. Another problem is that Actor messages can be lost and must be retransmitted in that case. Failure to do so would lead to holes at the receiving side. When dealing with streams of elements of a fixed given type, Actors also do not currently offer good static guarantees that no wiring errors are made: type-safety could be improved in this case.

What is Akka Streams

Akka Streams is a module built on top of Akka Actors to make the ingestion and processing of streams easy. It provides easy-to-use APIs to create streams that leverage the power of the Akka toolkit without explicitly defining actor behaviors and messages. This allows you to focus on logic and forget about all of the boilerplate code required to manage the actor. Akka Streams follows the Reactive Streams manifesto, which defines a standard for asynchronous stream processing. Akka Streams provide a higher-level abstraction over Akka’s existing actor model. The Actor model provides an excellent primitive for writing concurrent, scalable software, but it still is a primitive; it’s not hard to find a few critiques of the model.

Akka Streams is a library to process and transfer a sequence of elements using bounded buffer space. This latter property is what we refer to as boundedness and it is the defining feature of Akka Streams. Translated to everyday terms it is possible to express a chain of processing entities, each executing independently (and possibly concurrently) from the others while only buffering a limited number of elements at any given time. This property of bounded buffers is one of the differences from the actor model, where each actor usually has an unbounded, or a bounded, but dropping mailbox. Akka Stream processing entities have bounded “mailboxes” that do not drop.

Akka Streams in nonblocking that means a certain operation does not hinder the progress of the calling thread, even if it takes a long time to finish the requested operation.

The Akka Streams API is completely decoupled from the Reactive Streams interfaces. While Akka Streams focus on the formulation of transformations on data streams the scope of Reactive Streams is to define a common mechanism of how to move data across an asynchronous boundary without losses, buffering or resource exhaustion.The relationship between these two is that the Akka Streams API is geared towards end-users while the Akka Streams implementation uses the Reactive Streams interfaces internally to pass data between the different operators

Stream Terminology

  • Source: This is the entry point to your stream. There must be at least one in every stream.Source takes two type parameters. The first one represents the type of data it emits and the second one is the type of the auxiliary value it can produce when ran/materialized. If we don’t produce any we use the NotUsed type provided by Akka.There are various ways of creating Source:
val source = Source(1 to 10) 
source: akka.stream.scaladsl.Source[Int,akka.NotUsed] = ...
val s = Source.single("single element")
s: akka.stream.scaladsl.Source[String,akka.NotUsed] = ...
  • Sink: This is the exit point of your stream. There must be at least one in every stream.The Sink is the last element of our Stream. Basically it’s a subscriber of the data sent/processed by a Source. Usually it outputs its input to some system IO.It is the endpoint of a stream and therefore consumes data. A Sink has a single input channel and no output channel. Sinks are especially needed when we want to specify the behavior of the data collector in a reusable way and without evaluating the stream
val sink = Sink.fold[Int, Int](0)(_ + _) //Creating a sink
sink:akka.stream.scaladsl.Sink[Int,scala.concurrent.Future[akka.Done]] =
  • Flow: The flow is a processing step within the stream. It combines one incoming channel and one outgoing channel as well as some transformation of the messages passing through it.If a Flow is connected to a Source a new Source is the result. Likewise, a Flow connected to a Sink creates a new Sink. And a Flow connected with both a Source and a Sink results in a RunnableFlow. Therefore, they sit between the input and the output channel but by themselves do not correspond to one of the flavors as long as they are not connected to either a Source or a Sink.
val source = Source(1 to 3)
val sink = Sink.foreach[Int](println)
val doubler = Flow[Int].map(elem => elem * 2)
doubler: akka.stream.scaladsl.Flow[Int(Source Output),Int(Sink Input),akka.NotUsed(Result of Flow Operation)] =
val runnable = source via doubler to sink

Via the via method we can connect a Source with a Flow. We need to specify the input type because the compiler can't infer it for us. As we can already see in this simple example, the flowdouble is completely independent from any data producers and consumers. They only transform the data and forward it to the output channel. This means that we can reuse a flow among multiple streams.

  • ActorMaterializer: To run a stream this is required. It is responsible for creating the underlying actors with the specific functionality you define in your stream. Since ActorMaterializer creates actors, it also needs an ActorSystem. It basically allocates all the necessary resources to run a stream. It is important to remember that even after constructing the stream by connecting all the source, sink and different operators, no data will flow through it until it is materialized

Source can be considered as publisher and Sink as subscriber.

Basics and working with Flows

Back-pressure: A possible problematic scenario is when the Source produces values too fast for the Sink to handle and can possibly overwhelm it. As it gets more data that it cannot process at the moment it constantly buffers it for processing in the future.To combat this the Sink would need to communicate with the Source to inform it that it should “slow down” with pushing new data until it finishes handling the current batch. This enables a constant size buffer for the Sink as it will inform the Source to stop sending new data when it’s not ready. In the context of Akka Streams back-pressure is always understood as non-blocking and asynchronous.

When we talk about asynchronous, non-blocking backpressure we mean that the operators available in Akka Streams will not use blocking calls but asynchronous message passing to exchange messages between each other, and they will use asynchronous means to slow down a fast producer, without blocking its thread.

Graph:A description of a stream processing topology, defining the pathways through which elements shall flow when the stream is running.

RunnableGraph: A Flow that has both ends “attached” to a Source and Sink respectively, and is ready to be run().

It is possible to attach a Flow to a Source resulting in a composite source, and it is also possible to prepend a Flow to a Sink to get a new sink. After a stream is properly terminated by having both a source and a sink, it will be represented by the RunnableGraph type, indicating that it is ready to be executed.

It is important to remember that even after constructing the RunnableGraph by connecting all the source, sink and different operators, no data will flow through it until it is materialized.

Basic Code

import akka.{Done, NotUsed}
import akka.actor.ActorSystem
import akka.stream.ActorMaterializer
import akka.stream.scaladsl._
import scala.concurrent.Future
object StreamExample {
def main(args: Array[String]): Unit = { 
implicit val system = ActorSystem("Sys")
implicit val materializer = ActorMaterializer()
    val numbers = 1 to 100  

//We create a Source that will iterate over the number sequence
val numberSource: Source[Int, NotUsed] = Source.fromIterator(() => numbers.iterator)
 //Only let pass even numbers through the Flow 
val isEvenFlow: Flow[Int, Int, NotUsed] = Flow[Int].filter((num) => num % 2 == 0)
//Create a Source of even random numbers by combining the random number Source with the even number filter Flow   
val evenNumbersSource: Source[Int, NotUsed] = numberSource.via(isEvenFlow)

//A Sink that will write its input onto the console
val consoleSink: Sink[Int, Future[Done]] = Sink.foreach[Int](println)
//Connect the Source with the Sink and run it using the materializer
evenNumbersSource.runWith(consoleSink) }

In the example above we’ve created the:

  1. ActorSystem and ActorMaterializer instances that we will use to run the Stream. This is needed because Akka Streams is backed by Akka’s Actor model.
  2. Source based of the static number sequence’s iterator
  3. Flow that filters that only let’s through even numbers
  4. Sink that will print out its input to the console using println
  5. Complete Stream by connecting evenNumbers with consoleSinkand running it by using runWith.

Conclusion

Streaming is the ultimate game changer for data-intensive systems.It is best suited for big data-based applications.Main goal of Akka Stream is to build concurrent and memory bounded computations. Akka Streams also handles much of the complexity of timeouts, failure, backpressure, and so forth, freeing us to think about the bigger picture of how events flow through our systems.Here we learned more about Source,Sink and Flow and how can we run the stream using materializer.For more details you can follow the documentation: https://doc.akka.io/docs/akka/2.5/stream/index.html

In the next blog we will see more about graph and fan in and fan out functions.