Introducing Azkarra Streams V0.9

Florian Hussonnois
StreamThoughts
Published in
4 min readMar 11, 2021
Azkarra Streams Framework

A few days ago we released Azkarra Streams V0.9, and like every new release, it brings new features that make it easier to develop streams processing applications based on Kafka Streams. This version also represents a milestone since it includes important changes to Azkarra’s internal and public APIs in order to prepare the next steps of the project.

What is Azkarra Streams ?

For readers that discover the framework: The Azkarra Streams project is dedicated to making the development of cloud-native streaming microservices based on Kafka Streams simple and fast!

I recommend you to check out these blog posts to discover all the many possibilities Azkarra has to offer:

This blog post summarizes the most important improvements.

Kafka Streams Memory Management

Kafka Streams uses RocksDB for maintaining internal state stores. Usually, and depending on the number of states and the data throughput your application have to managed, it can be necessary to change the default settings of internal RocksDB instances. These can be specified through the rocksdb.config.setter configuration and the implementation of the RocksDBConfigSetter interface.

This release provides a new default AzkarraRocksDBConfigSetter class that allows advanced RocksDB tuning and helps to control the total memory usage across all instances.

For example, the configuraton below shows how to configure a block-cache size of 256MB shared across all RocksDB instances with a Write-Buffer-Manager of block.cache.size * 0.5.

azkarra {
streams {
rocksdb.memory.managed = true
rocksdb.memory.write.buffer.ratio = 0.5
rocksdb.memory.high.prio.pool.ratio = 0.1
rocksdb.block.cache.size = 268435456
}
}

For more information about how to configure RocksDB you can follow the official Tuning Guide.

Monitoring State Store Recovery Process

Azkarra now automatically configures a LoggingStateRestoreListener which logs and captures the states of the restoration process for each store. In addition, those information are exposed through the REST endpoint GET /api/v1/streams/:id/stores :

Example (JSON response):

[
{
"name": "Count",
"partition_restore_infos": [
{
"starting_offset": 0,
"ending_offset": 0,
"total_restored": 0,
"duration": "PT0.102756S",
"partition": 0,
"topic": "complex-word-count-topology-0-10-0-s-n-a-p-s-h-o-t-Count-changelog"
}
],
"partition_lag_infos": [
{
"partition": 0,
"current_offset": 0,
"log_end_offset": 0,
"offset_lag": 0
}
]
}
]

Exporting Kafka Streams States Anywhere

Since Azkarra v0.7, you can use theStreamsLifecycleInterceptor so-calledMonitoringStreamsInterceptor that periodically publishes the state of your KafkaStreams instance, directly into a Kafka Topic, in the form of events that adhere to the CloudEvents specification.

The MonitoringStreamsInterceptor has been enhanced so that you can more easily report the states of your instances in third-party systems other than Kafka (e.g., DataDog). For this, Azkarra v0.9 introduces the new pluggable interface MonitoringReporter. Custom reporters can be configured through the monitoring.streams.interceptors.reporters property or declared as components.

Example:

@Component
public class ConsoleMonitoringReporter implements MonitoringRepoter{
public void report(final KafkaStreamMetadata metadata) {
System.out.println("Monitored KafkaStreams: " + metadata);
}
}

Azkarra Dashboard

The UI of Azkarra Dashboard has been polished to provide a better user experience. In addition, a new tab has been added to get direct access to the state stores lag and offsets, as well as, information about their recovery process.

Azkarra Streams — Dashboard —v0.9.0

Azkarra API Changes

LocalStreamsExecutionEnvironment

This new release makes some changes to the Azkarra Low-Level APIs, including the existing StreamsExecutionEnvironment interface. Specifically, the DefaultStreamsExecutionEnvironment has been replaced by newLocalStreamsExecutionEnvironment class which is used to run local KafkaStreams instances.

Example:

LocalStreamsExecutionEnvironment
.create("default", config)
.registerTopology(
WordCountTopologyProvider::new,
Executed.as("wordcount")
).
start();

Currently, these changes do not directly impact applications developed with Azkarra. Indeed, they were motivated by the aim to bring in a future version additional implementations that will allow deploying and managing Azkarra instances running remotely, i.e., in Kubernetes :)

Azkarra Client

Additionally, Azkarra 0.9 introduces a new module named azkarra-client that provides a simple Java Client API for Azkarra Streams, generated through the OpenAPI Specification. Currently, the Client API is already used by Azkarra itself for executing remote Interactive Queries and will be leveraged in future versions to manage complete remote Azkarra instances.

KafkaStreamsContainerAware

Azkarra provides the new interface KafkaStreamsContainerAware that can be implemented by classes implementing :

  • org.apache.kafka.streams.KafkaStreams.StateListener
  • org.apache.kafka.streams.processor.StateRestoreListener
  • io.streamthoughts.azkarra.api.streams.kafkaStreamsFactory

Support for Apache Kafka 2.7

Finally, Azkarra Streams is always built on the most recent version of Kafka. Therefore, this new release adds support for version 2.7.

Join the Azkarra Streams community on Slack

The Azkarra project has a dedicated Slack to help and discuss the community. Join Us!

Conclusion

I would like to thank you, the early adopters of Azkarra who, through their feedback and support, help the project to become more and more robust after each new version.

Please, share this article if you like this project. You can even add a ⭐ to the GitHub repository to support us.

Also, the project is open for contributions. So feel free to propose your ideas or project needs and of course to create pull requests (PR).

Thank you very much

About Us :

StreamThoughts is a French IT consulting company specialized in Apache Kafka and Event-driven architectures, which was founded in 2020 by a group of technical experts. Our mission is to help our customers to make values out of their data as real-time event streams through our expertise, solutions and partners.

We deliver high-quality professional services and training, in France, in data engineering, event streams technologies and the Apache Kafka ecosystem and Confluent.Inc Streaming platform.

--

--

Florian Hussonnois
StreamThoughts

Lead Software Engineer @kestra-io | Co-founder @Streamthoughts | Apache Kafka | Open Source Enthusiast | Confluent Kafka Community Catalyst.