Using Akka Streams
At Hootsuite the Play framework has been an important part of our Scala based microservices. As we refined our practices, however, we began to experiment at a lower level of abstraction than the Play framework provides. In fact, since version 2.5, Play’s streaming API has been based on Akka Streams. Since our newer microservices skeletons come with Akka dependencies built in, we decided to use Akka Streams directly for all our streaming needs.
What is Streaming?
“Big Data” is a buzz word that’s been thrown around a lot in recent memory. With the growth of landscapes such as Social Media and the IoT, an increased emphasis has been placed in reacting to large amounts of data in real time. In order to analyze data in real time we need a way to stream this data. From this necessity, the Reactive Streams interface was born.
Akka streaming is an implementation of the Reactive Streams interface. At its core, the Reactive Streams interface aims to be “an initiative to provide a standard for asynchronous stream processing with non-blocking back pressure.” The four main abstractions in Akka’s streaming API are:
- Source: A processing stage with exactly one output, emitting data elements whenever downstream processing stages are ready to receive them.
- Sink: A processing stage with exactly one input, requesting and accepting data elements possibly slowing down the upstream producer of elements
- Flow: A processing stage which has exactly one input and output, which connects its up and downstreams by transforming the data elements flowing through it.
- RunnableGraph : A Flow that has both ends “attached” to a Source and Sink respectively, and is ready to be run().
It’s worth noting that whenever creating a Source from an HTTP response, the Source must always be consumed by some Sink. If a Source is not consumed then Akka’s back pressure property will take effect. This will queue up all operations related streaming until that Source has been consumed.
Streaming at Hootsuite
A big part of Social Media is — media! Streams are an efficient way to transport media in the form of images and videos from client to server side, and from server side to external services (e.g. Social Networks, screening services). Apart from reducing boilerplate, they rid us of the need to temporarily store files in memory before moving them around. Overall, streaming allows for cleaner code, more parallel processing, and reduces our infrastructure needs.
Streaming in Action
Lets dive into an example to see Akka streaming in action! In this example, we’ll stream an image from client to a server via a “stream_image” POST endpoint. The server will download the image, and stream the latest image it received to a client who makes a GET request to the “display_image” endpoint. Note that it isn’t good practice to download the image to disk, but to emphasize the concepts of Sources and Sinks.
We begin by adding the following two dependencies to our build.sbt:
We use the first dependency on the client side for making HTTP requests, and on the server side for receiving HTTP requests. The second dependency is used on the client and server side for creating and operating (on) the four Akka streaming abstractions.
Next, in the main class on both client and server side, we’ll add the following implicit parameters:
These first two parameters introduce Akka’s implementation of the Actor concurrency model to our project. In this model, an Actor is a computation entity that can alter other Actors or have its own state altered by sending and receiving messages to and from other Actors. An Actor System is the “universe” in which the Actors live, and the Actor Materializer provisions the Actors within an Actor System in such a way that we won’t have to worry about manually wiring Actors together.
On the server side, we set up the POST endpoint for consuming an image stream, and a GET endpoint for emitting an image stream. In the POST endpoint, we expect the client to send a Source within the entity of a multipart request. We then consume this source by “running” the Source into a Sink, with the Sink being a file. Connecting this Source to a Sink is an example of forming a Runnable Graph.
In the GET endpoint, we create a Source from the file that serves as a Sink in the POST endpoint. After we create the Source, we return the stream as the response.
And that’s it for the server side!
On the client side, we’ll POST to the stream_image endpoint, but need to create a Source. In this example, we’ll be creating a source from an online image.
In order to create this stream, we first make a GET request to the image url. Once we receive a response, Akka creates a Source (from the image host), which we send in a multipart request.
Even if the response turns out invalid, we must still consume the responses Source. If we don’t, then back pressure will block all further producer consumer interactions. If a Source containing an image was created successfully, we then POST the Source in a multipart request:
And there we have it! Our very own application that uses Akka’s streaming API. Social Media is a real-time landscape, and Hootsuite employs Akka’s implementation of the Reactive Streams interface to move around all the media being shared. A full runnable version of the server side and client side code can be found here.
Rajdeep Kambo is a co-op Student on the Plan and Create team at Hootsuite. He is currently studying Computer Engineering at The University of British Columbia.