Fledge over the flow …
How to determine a streaming problem.
This is a story about a little experience of integrating (Akka-Stream, Alpakka-Kafka, Kafka, Alpakka-Slick, TimescaleDB) — Part 1/2
Fledge over the flow
I think this blog post should have 3 different parts:
1- Why we choose these tools
2- How we Integrate them
3- Which Issues & challenges we faced on (service clients)
Imagine that you have 3 services and there are communicating to each other (this communication is based on messages)(for example service A, B, C) there are many ways to implement the communication channels like use HTTP requests, Akka-remoting (over TCP use Netty) and many other ways we read about some of them below:
Note: In DDD (Domain Driven Design), we called each service a “Bounded-Context” and each context has 3 types of activities called “Domain Activity’s” -> “Events”, “Queries”, “Commands”. So each message passed through has one of these types. more…
“Commands” generated by a service should make an affect on other services or recipients
“Queries” generated by a service has to ask something from recipients and waiting for answer
“Events” are generated to tell recipients something happened.
| How should we implement these communication channels?
So as you may know when we face a problem we shouldn’t recommend possible right tools at first. Depending on our case, we’ll suggest the top-level abstract answers that may solve our problem and then we research which tools cover our requirements.
In my case: We have a big monolithic service as back-end and some independent services created step by step over time. Now we want to break it down to micro-services (again step by step). So what should we do? Where should we start? What are our problems?
Note: Logs vs Metrics
Log: In a system, “logs” are text-based event describers that have a specific format.
Metric: In a system, “metrics” are measurable event describers that make a metrical sense.
For any event in your system, there is a log file generated. It carries a set of data about that event to describe it. This log data has the details about the event for example any resource which was accessed, who did it, and at what time. Each and every event in your system has various types of data in it.
While metrics are about measuring the system KPI like CPU Utilisation, Memory Utilisation etc. at a specific timestamp for a system, logs are about a particular event like new login, spike in bandwidth utilisation etc.. This unit of measurement for metrics may have a timestamp, value, and identifier of whether this applies to a source or a tag. The logs get collected whenever any event occurs, but the metrics are generally collected on specific pre-defined time intervals.
First, we choose a part of a project that is easy to decompose from other parts. This part should be easy to decompose and independent from the other parts. Finally, we choose the metric aggregator service to decompose. This service is responsible for aggregating our metrics and gathering them into a database so we could run complex analytic queries on our data. The first step is decomposing a metric aggregator engine to use it as a service. Well, this part of our story is the “creation of service”….
Note: We have 3 main types of message transformation
1- Request /Response: this pattern is good for “Queries” & “Commands” type of messages, where we send a request to a service and wait for its response or its acknowledgment.
2- Fire and Forget: This pattern is suitable for sending some less-important messages to other services., like “Events” type of messages or in some cases “Commands”. Less-important means that maybe your message will not receive to recipients and you are not notified.
3- Publish and Subscribe: this pattern is good whether recipients like to get new messages whenever they like/decide to get new messages.(This pattern is good when we use middlewares for example ‘a queue’). Imagine service publishes new messages into a queue to be fetched and processed by some recipients. This pattern is useful for events and some sort of “Commands”.
So why have we stopped using this stack?
- Prometheus is powerful for a specific use-case in which we want (metric aggregation). But we need a way to break our monolith service to micro-services and decomposing metric aggregator service as the first step. (in fact, we like to test a process that is general-purposes enough to meet our requirements)
After that, we decided to immigrate to TimescaleDB. As we should create some REST API endpoints for other internal projects and we expose them so we used Akka-HTTP . So from now, other services should send their HTTP requests to this new service. In this scenario whenever each service that likes to send a metric to our metric aggregator should just do one simple function call and this function designed in an external module (we called them service clients). These clients like libraries added to metric generator services (as a library dependency).
This way is one of the exchange patterns Request/Response.
Note: About backpressure
Backpressure: A means of flow-control, a way for consumers of data to notify a producer about their current availability, effectively slowing down the upstream producer to match their consumption speeds. In the context of Akka Streams, back-pressure is always understood as non-blocking and asynchronous.
The way we handled the requests in Step 2 has some performance issues which emerge so fast. When we have thousands of metrics generated per minute and each function call for sending a metric to our aggregator service should wait for response and event, this HTTP model is a headache. So we migrate from Akka-HTTP and land in socket-based API +buffer. Now we need something to back-pressured the bulk of requests which are coming into aggregator service. For this purpose, we powered socket API with AKKA-Stream.
Step3 works for a few months but it’s not good at Scale and it’s hard to develop we need something better for being ready to scale-out…
For example, imagine that you need to add many metrics to this system. You should design endpoints and streams and models and worry about changing common endpoints and models you used because they are not backward compatible. And note that our use case is not just metrics we should design a general communication channel with many purposes and one of that purposes is metric aggregation.
Step3 is suited to handle “Queries”, “Commands” and “Events” but why should we use this pattern for a huge bulk of lazy “Commands” and “Events” when we could use Pub/Sub pattern? and make our system more effective and powerful & Reactive while we have enough resources now… (as we previously referred Pub/Sub is a good pattern for working with “Events”)
So here is where Apache-Kafka will come ….
Will read at the next post:
1- how to design a Stream pipeline with AKKA stack.
2- how to use Alpakka-Slick and Kafka
3- how to write Kafka-producer & consumer client for a suitable function call.
4- how to work with time series and timescaleDB in this use case.
See you soon!