Miscellaneous ways of Installation of Kafka on Ubuntu 18.04 for Novice Data Architect

Published in

The Startup

4 min readJun 8, 2020

I have kept this blog as short as possible on two commonly used ways of how to use Kafka Producer-Consumer processes over single node single broker acrhitecture with as minimal required features as possible to make it simple.

Prerequisite:

Ubuntu 18.04 server and a non-root user with sudo privileges.
At least 4GB of RAM is required on the server. Installation without this amount of RAM may cause the Kafka service to fail, with the Java Virtual Machine(JVM) throwing out an “Out Of Memory” exception during startup. Even using Docker services one need to make sure the host machine has more than 4GB of RAM (advisable 8 GB RAM) as it is an absolute requirement for Kafka will consume a big part of RAM.
OpenJDK 8/11 should be installed on the server. Kafka is written in Java, so it requires a JVM, however in order to use Confluent Open Source binaries one need to use Java 8.

Remember : While using sudo one must remember the root password.

1. Use Kafka setup files licensed by Apache

In order to install Apache Kafka as stated by official Kafka, we follow series of steps:

First Update the package repository cache of Ubuntu server with the following command :

$sudo apt-get update

Then we will install OpenJDK 8 or 11 on Ubuntu 18.04 server :

$ sudo apt-get intall openjdk-8-jdk
$ sudo apt-get install openjdk-11-jdk

Then we download the Kafka 2.4.0 (or any latest version as available) from Official website of Kafka :

$ sudo wget “http://apachemirror.wuchna.com/kafka/2.4.0/kafka_2.12–2.4.0.tgz”or$ sudo wget “http://mirrors.estointernet.in/apache/kafka/2.4.0/kafka_2.12_2.4.0.tgz”

After the installation we have to download the Zookeeper :

$ sudo wget “http://apachemirror.wuchna.com/zookeeper/stable/apache-zookeeper-3.5.6-bin.tar.gz”or$ sudo wget “http://mirrors.estointernet.in/apachezookeeper/”

Now we need to extract the kafka_2.12–2.4.0.tgz and apache-zookeeper-3.5.6-bin.tar.gz

$ sudo tar -xzf Downloads/kafka_2.12–2.4.0.tgz$ sudo tar -xzf Downloads/apache-zookeeper-3.5.6-bin.tar.gz

Now we enter into Kafka directory :

$ cd kafka_2.12–2.4.0.tgz

As Kafka uses Zookeeper so we need to first start a Zookeeper server, if we don’t have one. Always remember that Kafka broker runs over Zookeeper servers. It is Zookeeper server that is responsible for management of Kafka leaders and followers.

$ sudo bin/zookeper-server-start.sh config/zookeeper.properties

Now after Zookeeper instance is on, we use Kafka-server :

$ bin/kafka-server-start.sh config/server.properties

We create a topic named “Test123” :

$ sudo bin/kafka-topics.sh --create --bootstrap-server localhost:9092 --replication-factor 1 --partitions 1 --topic Test123

We can check created topic by writing the command :

$ sudo bin/kafka-topics.sh --list --bootstrap-server localhost:9092

After running the producer we can write some messages:

$ sudo bin/kafka-console-producer.sh --broker-list localhost:9092 --topic Test123

and then we can consume the data by running a consumer process on another terminal.

$ sudo bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic Test123 --from-beginning

The demo of the above hand usage of Kafka for a single broker and node is shown below:

2. Install Kafka from Confluent Open Source

Confluent is a organisation founded by Neha Narkhede, Jun Rao and Jay Kreps who were solely responsible for the development of Kafka when they were in LinkedIn. The organization provides us with many external open source libraries and gives advantage over Kafka in the following ways:

Additional Clients : Supports C, C++, Python, .NET and several other non-Java Clients.
REST Proxy — Provides universal access to Kafka from any network connected device via HTTP
Schema Registry — Central registry for the format of Kafka data — guarantees all data is always consumable
Pre-Built Connectors — HDFS, JDBC, Elasticsearch, Amazon S3 and other connectors fully certified and supported by Confluent

Now we will be revising the steps for the basic installation of Kafka and later we will see how to use commands for using Kafka.

First Update the package repository cache of Ubuntu server with the following command :

$ sudo apt-get update

Next we will install the Confluent public key :

$ wget -qO --http://packages.confluent.io/deb/3.3/archive.key | sudo apt-key add

Next we add the repository to sources list:

$ sudo add-apt-repository “deb [arch=amd64] http://packages.confluent.io/deb/3.3 stable main

Now we install Confluent Open Source Platform:

$ sudo apt-get installconfluent-platform-oss-2.11

Now we can start our Confluent OSS Platform. On starting the platform we start Kafka Server, Zookeeper Server, Schema Registry, REST API and Kafka Connect at the same time in the background.

$ sudo confluent start

As the servers are up we can create a topic in our CLI using commnd :

$ cd /usr/bin$ sudo kafka-topics --create --bootstrap-server localhost:9092 --replication-factor 1 --partitions 1 --topic Test123

We can already observe we don’t have to give .sh extension in our commands. But we need to enter into /usr/bin folder before accessing this commands from now on. We can start our producer process by using :

$ sudo bin/kafka-console-producer.sh --broker-list localhost:9092 --topic Test123

Write some messages in Kafka Producer. In another terminal, we will be implementing Consumer process in order to consume the messages:

$ cd /usr/bin$ sudo bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic Test123 --from-beginning

Hopefully this is clear to all the Debian Linux users and let me know if any doubts ever come up.

Miscellaneous ways of Installation of Kafka on Ubuntu 18.04 for Novice Data Architect

1. Use Kafka setup files licensed by Apache

2. Install Kafka from Confluent Open Source

Written by Suraj Saha