Connecting Kafka to MinIO

Alex
Alex
Jun 17, 2020 · 4 min read

How to connect data being distributed via a web-socket to Kafka and then onto an S3 bucket.

If you’ve made it here you’ve probably exhausted stack-overflow and just want to know how to stream data into an S3 bucket. Fear not, the code is included below.

But first, a couple of helpful resources (other than stack-overflow) that i came across on my quest for real-time data storage.

  1. The Confluent Slack group: You may have to put up being bombarded with sales emails but the first hand help you’ll get to specific problems is second to none.
  2. If you’re completely unfamiliar with Kafka then do check out this Medium post for a good intro.
  3. Confluent provide a load of test code on their git page which helped me towards forming a base configuration.

As you may have already guessed, this article uses Kafka Connect; a tool created by Confluent. My first mistake was to start with the original, vanilla Kafka released by the Apache foundation. Save yourself a load of trouble and head over to Confluent and use their set of client libraries. Set up by the same guys who developed Kafka at LinkedIn, they’ve created a whole eco-system of frameworks and tools centred around Kafka.

Why

The reason for all of this is to have all the data being produced by the web socket stored in a more “permanent” data store. This maybe be to do any data analytics, ML training or in my case, have recorded environmental data at hand for any models that may need it.
Now in this example we’re converting the received data over the web socket into json, so Minio may not be the best choice and you maybe better off with a proper document store like MongoDB. However, Minio still remains a solid choice as it can hold just about anything… but that’s a debate for another day.

Base Architecture

The goal of this article it to create something like this:

High level architectural overview

Implementation

There’s basically two parts to getting this up and running:

  • A python app that connects to the web socket and then streams the messages to our Kafka server over a pre-configured topic (test_topic).
  • A docker-compose file that pulls all the dependancies we need to run the pipeline. This includes, Minio, Kafka and all the Confluent tools.

The docker compose file also runs some in-line scripts to configure Minio and Kafka Connect with anything it needs to run. This is also handy as it means the only app that we need to deploy is the python data producer as any extra set up is pumped straight into the tools on start up. More on that below.

The Web-socket Data Producer

Given we got a web socket thats pumping out data, the first step is to write a small python script which will:

  • Connect to a web socket
  • Connect to Kafka
  • Convert the data from the web socket into JSON and pipe it into a Kafka topic

This should do the trick:

To aid deployment we’ll wrap all the above in a Docker file:

The script above using the following requirements.txt:

websockets==8.1
kafka-python==1.4.7
certifi==2019.9.11

We can then combine this docker file along with any other services we need ( like Kafka) into a docker-compose file to bring everything up together.

I’ll link it here as its quite large.

There’s alot going on in the compose file so i’ll try and break it down.

createbuckets:

This service waits for MINIO to start before it creates a bucket for the data to go into. This bucket name will be used by Kafka Connect later.

streaming:

This service is the dockerized kafka_websocket_producer.py file which streams the data into a Kafka topic. You’ll probably need to play around with the path to the Docker file. I had mine in a folder called “streaming”

connect:

This service is the Confluent tool which listens to a topic and and puts the data into an S3 bucket. Amongst other things it:

  • It downloads the Confluent s3 sink add-on
  • Tells the tool which topic to listen to
  • Sets up the Minio connection

There’s a load of settings which define how the data gets stored and partitioned in the Minio bucket which i’d recommend having a play with, primarily flush.size and partitioner.class. In this example it creates a new partition in the bucket every time 100mb has been received in the topic.

Other tools which form part of the Confluent Kafka eco-system

Core Kafka services:

zookeeper:
broker:

A user interface, viewable at http://localhost:9021/

control-center:

Others:

schema-registry:
ksql-server:
ksql-cli:
ksql-datagen:
rest-proxy:

Running it all up

docker-compose up

You should be able to see the data going into the bucket by navigating to the Minio server URL and having a look in test-bucket.

The Startup

Get smarter at building your thing. Join The Startup’s +786K followers.

Sign up for Top 10 Stories

By The Startup

Get smarter at building your thing. Subscribe to receive The Startup's top 10 most read stories — delivered straight into your inbox, once a week. Take a look.

By signing up, you will create a Medium account if you don’t already have one. Review our Privacy Policy for more information about our privacy practices.

Check your inbox
Medium sent you an email at to complete your subscription.

Alex

Written by

Alex

Software Engineer, Architect and General Polyglot

The Startup

Get smarter at building your thing. Follow to join The Startup’s +8 million monthly readers & +786K followers.

Alex

Written by

Alex

Software Engineer, Architect and General Polyglot

The Startup

Get smarter at building your thing. Follow to join The Startup’s +8 million monthly readers & +786K followers.

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store