This blog is continuation of discussion in these posts:
- Getting Your Feet Wet with Stream Processing — Part 2: Testing Your Streaming Application
- Testing Event-Driven Systems
Main purpose of this article is to share experience of testing Apache Kafka based applications and data pipelines, using different approaches.
- Kafka based application — any application that uses Kafka API and communicates with Kafka cluster.
- Data pipeline — is a set of Kafka based applications that are connected into a single context.
Examples are built using java and docker. For detailed information, check this repository on github.
- Unit tests of Kafka Streams application with kafka-streams-test-utils
- Integration tests with EmbeddedKafkaCluster
- End-to-end tests with Testcontainers
Many have faced a challenge to test applications that communicate with Kafka cluster. The common issues are how to write unit test, how to automate test for Kafka application in isolation and how to handle end-to-end and system test for whole data pipeline.
When I write this article, there are several approaches, some of them not resolved yet:
- KIP-139: Kafka TestKit library is still open. Once it will be ready, kafka dev community will have a proper test-kit.
- #26 Encapsulate EmbeddedSingleNodeKafkaCluster in a seperately-available maven/gradle/sbt dep. Kafka cluster in memory. When this post is published, there is an open issue for having release of embedded kafka cluster. Yeva Byzek offers to use KafkaEmbeddedCluster in this blog-post. There is no existing release for these utils. I decided to duplicate this part of code base and to build an internal module as an example.
- KIP-247 Add public test utils for Kafka Streams. This KIP is ready and has release for unit testing kafka streams topologies — kafka-streams-test-utils.
- Testcontainers — testing with docker containers.
2. Unit tests of Kafka Streams application with kafka-streams-test-utils
Kafka-streams-test-utils is a test-kit for testing stream topologies in memory without need to run Kafka cluster. Test-kit supports stateful operations and avro serialisation/deserialisation via mocking of schema registry.
All that you need is TopologyTestDriver, which you can init for each test with topology under test.
I choose for testing topology with stateful operations and avro.
Unit test of topology.
Pay attention to issue #877 — Passing Schema Registry URL twice to instantiate KafkaAvroSerializer.
3. Integration tests with EmbeddedKafkaCluster
Integration test is a way of how to test services in isolation but with required dependencies. Embedded Kafka cluster combines broker, zookeeper and schema-registry in one. To test asynchronous system I choose awaitility library. It is a useful tool for handling timeouts and asynchronous asserts.
4. End-to-end tests with Testcontainers
In context of Kafka based applications, end-to-end testing will be applied to data pipelines to ensure that, first, the data integrity is maintained between applications and, second, data pipelines behave as expected. Testing requires all integrated components of application to be up and running — in order to be able to test them with different scenarios.
- kafka broker
- confluent schema registry
- http-producer: REST producer
- http-materializer: streams application, that downstreams topic data and RPC to another REST service
- db-mock: REST service and db in memory (use .json file as persistent storage), all-in-one
I describe two approaches of how to prepare container environment for testing. Both of them require a certain start-up sequence, namely: start kafka cluster first, wait until it is ready to serve data, create required topics, register schemas and then start application services.
Approach 1: Declare each container separately
Each container is defined in the same test class under its own method and is run either @BeforeClass or in a static block. Be aware about differences mentioned in documentation.
Testcontainers support Waiting strategies until containers are ready for test. Potential improvement could be implementation of health checks for each component in data pipeline.
To make containers interact with each other, you should put them with the same network. If one container depends on another container, you might need network alias to setup communication between them. You can provide your own network alias for container or use an already existing one. Aliases can be injected as environmental variable.
Container ports can be mapped to localhost while testing, if container’s port is exposed. Files and directories can be also mapped as volumes on the classpath in container.
Another useful feature is executing commands inside a container. It can be useful for creating a kafka topic before running streams application.
Approach 2: Define containers in docker-compose file
Another approach is to prepare containers environment for testing is to define all the environment information in a docker-compose.yml. In this case Testcontainers manage composition of containers. All services share common network. Name of service uses as a network alias. This approach also supports waiting strategies. Example of how to create topic in a docker-compose file can be found in confluent examples.
Hopefully, the described approaches and code examples will help make it easier to test kafka clients application. If you would like to contribute, just open a ticket on Github for ideas. Additionally, I would recommend to check Confluent blog that can be helpful in learning more about Kafka.
Test examples of kafka-clients: unit, integration, end-to-end
Author: Nikita Zhevnitskiy