Kafka Hardware Requirements

Akash D Goel
4 min readMar 10, 2020

Kafka Hardware Requirements

CPU

A powerful CPU is not needed unless SSL and log compression are required. If compression is used, then producers and consumers must commit some CPU cycles for compressing data and decompressing data. More cores also lead to more parallelization.

Choose a modern processor with multiple cores. Common clusters utilize 24 core machines. If you need to choose between faster CPUs or more cores, choose more cores. The extra concurrency that multiple cores offers will far outweigh a slightly faster clock speed.

Memory

Kafka uses heap space very carefully and does not require setting heap sizes more than 6 GB. It can run optimally with 6 GB of RAM for heap space. This will result in a file system cache of up to 28–30 GB on a 32 GB machine. For especially heavy production loads, use machines with 32 GB or more. Extra RAM will be used to bolster OS page cache and improve client throughput. While Kafka can run with less RAM, its ability to handle load is hampered when less memory is available.

You need sufficient memory to buffer active readers and writers. You can do a back-of-the-envelope estimate of memory needs by assuming you want to be able to buffer for 30 seconds and compute your memory need as write_throughput * 30.

A machine with 64 GB of RAM is a decent choice, but 32 GB machines are not uncommon. Less than 32 GB tends to be counterproductive (you end up needing many, many small machines).

Kafka brokers use both the JVM heap and the OS page cache

  • The JVM heap is used for replication of partitions between brokers and for log compaction.
  • Consumers always read from memory, i.e. from data that was written to Kafka and is still stored in the OS page cache. The amount of memory this requires depends on the rate at this data is written and how far behind you expect consumers to get. If you write 20GB per hour per broker and you allow brokers to fall 3 hours behind in normal scenario, you will want to reserve 60GB to the OS page cache. In cases where consumers are forced to read from disk, performance will drop significantly

Kafka Connect itself does not use much memory, but some connectors buffer data internally for efficiency. If you run multiple connectors that use buffering, you will want to increase the JVM heap size to 1GB or higher.

Disks

Use multiple drives to maximize throughput. Do not share the same drives used for Kafka data with application logs or other OS filesystem activity to ensure good latency. You can either combine these drives together into a single volume (RAID) or format and mount each drive as its own directory.

  • If you configure multiple data directories, the broker places a new partition in the path with the least number of partitions currently stored. Each partition will be entirely in one of the data directories. If data is not well balanced among partitions, this can lead to load imbalance among disks.
  • RAID can potentially do better at balancing load between disks (although it doesn’t always seem to) because it balances load at a lower level. The primary downside of RAID is that it reduces the available disk space. Another potential benefit of RAID is the ability to tolerate disk failures. Avoid using RAID 5 or 6. RAID 10 is recommended

Avoid network-attached storage (NAS). NAS is often slower, displays larger latencies with a wider deviation in average latency, and is a single point of failure.

SSDs don’t deliver much of an advantage due to Kafka’s sequential disk I/O paradigm and Kafka writes to disk are asynchronous. That is, other than at startup/shutdown, no Kafka operation waits for a disk sync to complete; disk syncs are always in the background.

Network

A fast and reliable network is an essential performance component in a distributed system. Low latency ensures that nodes can communicate easily, while high bandwidth helps shard movement and recovery. Modern data-center networking (1 GbE, 10 GbE) is sufficient for the vast majority of clusters.

You should avoid clusters that span multiple data centers, even if the data centers are colocated in close proximity; and avoid clusters that span large geographic distances. Kafka clusters assume that all nodes are equal. Larger latencies can exacerbate problems in distributed systems and make debugging and resolution more difficul

Filesystem

You should run Kafka on XFS or ext4.

  • XFS is a very high performance, scalable file system and is routinely deployed in the most demanding applications. It’s RHEL 7 is the default file system and is supported on all architectures. XFS has its advantages but in a JBOD setup, it doesn’t really provide a lot of benefits.
  • Ext4 does not scale to the same size as XFS, is fully supported on all architectures and will still continue to see active development and support.

--

--