FizzBuzz in Scala: basic → parallel → reactive → distributed
If someone hasn’t heard about FizzBuzz:
Fizz buzz is a group word game for children to teach them about division.
Players take turns to count incrementally, replacing any number divisible by three with the word “fizz”, and any number divisible by five with the word “buzz”.
I intentionally took one of the simplest algorithms ever, just to talk more about Scala and surrounding its libraries, frameworks and approaches, instead of discussing a particular algorithm.
Let’s start with a straightforward (lazily computed) implementation in Scala:
Let’s do it using Scala Parallel Collections. However, before we implement it, we should decide what to do with the ordering.
Usually, you have the following options:
- Sorting results afterwards (generally an expensive option)
- Insert results into some in-memory or persistent storage or structure that preserves some predefined ordering
- Don’t do any ordering when it’s not really required in your case, just save some input value instead and the result of the computation.
In case of FizzBuzz, we don’t have any direct requirements to do any ordering for our computation, there is no connection between result items, and, also for the simplicity sake, we could just return some structure with a number and corresponded String.
So, it would be the following:
And that’s it, it is now computed in parallel for you.
To implement the reactive implementation, now we need some additional library. I used Akka Streams in my case:
This case, just to show some benefits from the reactivity, we throttle the rate of computations here.
Now that’s more challenging. We definitely need some tools to help with the implementation that might run on a whole cluster.
I’ll do it with two different approaches:
- Apache Spark batch job — a very popular distributed processing framework, mostly used for micro-batching and analytics purposes. You can run Spark jobs on a Spark Cluster or on a Kubernetes cluster (with the specialised Spark operator).
- Cloudflow streamlets — a young, yet very promising, more general application toolkit for distributed computing, which might work on top of Spark, Flink, Akka. You can run your solutions on a Kubernetes cluster.
Let’s create a Spark job, that generates a batch of numbers and produces results of FizzBuzz algorithm computation into csv files:
I’ve created a very simple pipeline with the following architecture:
So, I have here:
- FizzBuzz HTTP/JSON Ingress Streamlet — which accepts a number in JSON structure from any http clients
- It sends, using Kafka and Avro a message to FizzBuzz Processor Streamlet, which computes FizzBuzz string
- Now it follows to the FuzzBuzz Printer, which just logs it
The JSON models are defined with Avro schemas like:
Now, if you have a Kubernetes cluster, you can easily run and scale the whole solution (if you don’t, just use local sandbox with sbt runLocal):
curl --location --request POST 'http://localhost:3000' \
--header 'Content-Type: application/json' \