Flow Control using your Monitoring Tool

Uday Sagar Shiramshetty
5 min readMar 17, 2020

--

Flow control is the process of managing data flow rate between two systems. It prevents fast sender from overwhelming slow receiver. It ensures Quality of Service while optimizing for higher resource utilization, giving you peace of mind to focus more on new features and less on operational routines. Without proper flow control, parts of your distributed systems can get overloaded and fail or perform poorly. In the below gif, producer doesn’t slow down on message enqueue rate when queue size is growing and eventually causes queue failure.

There are many ways to communicate to the producer that queue size in increasing. Queue size can be stored in any kind of data store. But since it is generally reported to your monitoring tool for monitoring purposes, why don’t you use that and avoid costly data store setup and connections on Producer and Consumer? Depending on the criticality of flow control and importance of higher resource utilization, streaming congestion status from a monitoring tool is an easier and low-cost effort to implement. But your decision should also consider the latency of the streaming data and the service availability of your monitoring tool.

In this article, I will show how SignalFx (a subsidiary of Splunk and leading monitoring solution) can be used to control data flow on a message queue that is acting as a buffer between a producer service and a consumer service. The goal is to minimize consumption backlog as much as possible. Consumer will report the number of messages yet to be consumed as the queue size to SignalFx. Producer will stream real-time queue size from SignalFx to adjust its enqueue rate.

Note that this article is not examined by SignalFx for correctness.

It is very impressive that it generally takes less than 2 seconds for latest queue size to be available for producer after it is reported by the consumer to SignalFx.

To keep things simple, let’s see how Additive Increase Multiplicative Decrease (AIMD) can be used as flow control algorithm. With AIMD, producer’s message enqueue rate increases linearly and decreases exponentially as soon as the queue size goes above a static threshold. Consumer has a fixed consumption rate.

This article uses signalfx-java library to communicate with SignalFx. This is a piece of code to report various metrics used for flow control below:

// from consumer
sfxMetrics.registerGauge("queueSize", queue::size);
// from producer
sfxMetrics.registerGauge("enqueueRate", producer::getRate);
sfxMetrics.registerGauge("queueSizeDatapointLatency", queueSizeProvider::getDatapointLatency);

Producer streams the queue size using SignalFlow Analytics Language, a flexible language to execute a program text on SignalFx real-time analytics engine:

Then, applying AIMD on top of queue size is the easiest part:

// additiveIncrement > 0
// 0 < multiplicativeDecrement < 1
Integer queueSize = queueSizeSupplier.get();
double newRate;
if (queueSize == null || queueSize > queueSizeThreshold) {
newRate = currentRate * multiplicativeDecrement;
} else {
newRate = currentRate + additiveIncrement;
}
// producer can limit the newRate to be within a fixed range.
newRate = Math.min(maxRate, newRate);
newRate = Math.max(minRate, newRate);
currentRate = newRate;

When queueSize is not readily available from SignalFx, a defensive choice is to lower the enqueue rate and avoid overloading the queue. There are also other choices like extrapolate last reported datapoint for missing datapoints up to n times, use a mean over last x seconds etc.

As expected, AIMD produces a beautiful saw tooth graph for the producer’s enqueue rate.

When the enqueue rate is overlaid on queue size below, relation between queue size and enqueue rate adjustment is evident. Enqueue rate increases linearly until queue size is above a threshold (100) and drops sharply afterwards. Then, once queue size is below 100, enqueue rate is steadily increased.

There it is, an easy solution to control data flow rate.

Closing thoughts:

Your Monitoring tool may already have vast amount of data that can be combined and leveraged in many new ways. With a robust and sophisticated monitoring tool like SignalFx, you can do some cool things to control data flow rate:

  1. Perform aggregations to merge thousands of time series and pick a mean, exponential moving average, etc. in real-time. For example, when Kafka is your message queue, you can pick sum of topic partition lags or a maximum lag across all topic partitions to guide the producer message enqueue rate.
  2. Use historical signal data to compute a better signal for the current instant. Knowing daily or weekly patterns will allow you to slow down on unimportant tasks early to allow room for forthcoming traffic.
  3. Create compound signals on top of different types of signals in real time. When your infrastructure and application services are instrumented to send metrics, you can create a compound condition on those independent signals for better flow control. For example: you can use remaining network bandwidth on AWS EBS disks on top of current queue size to guide the producer rate and ensure quality of service.
  4. Create rate limiters on some signals and circuit breakers on other signals. While queue size may be a good signal for adjusting producer’s rate limiter, another signal can be used as circuit breaker and stop message production onto the queue. For example: stop message production when additional consumer or database capacity is needed to proceed.
  5. Add or remove underlying signals leading to an immediate effect on the final signal. For example: When additional capacity is added to your message queue cluster, your tolerable queue size may have increased and producer rate can be increased immediately.
  6. Introduce artificial signals into the calculation in real time with minimum effort. Artificial signals can be created to reserve promised capacity for a high paying customer. They can also used to introduce chaos into the system without touching any production systems.
  7. Dynamically update flow control logic. SignalFx offers SignalFlow, a computation engine with a programming language, comprehensive library and background computation technique that allows you to update your logic dynamically by simply updating the program text.

That’s it. Let me know if you have used your monitoring solution for any cool use-cases.

--

--