Deploying Kafka on EC2
Here is the story about Kafka deployment on EC2 from Confluent.
In summary, using EBS volumes will decrease network traffic when a broker fails or is replaced. Also, the replacement…www.confluent.io
Mainly it tells us about the storage options and availability challenges for EC2 deployment.
EBS? Ephemeral Disk?
You know, Kafka itself has built-in fault-tolerance by partition replication. So it looks we don’t need to choose EBS with paying additional costs and latencies, actually I thought so.
However the discussion here says EBS is better choice when we think about broker replacement. EBS volumes are isolated from EC2 machines so when we replace the failure node, data on the disk can be persisted and we can re-mount the volume to the new EC2 machine. It means the new machine doesn’t have to fetch all of the data when she comes up.
One thing we have to note is, Kafka itself does not have EBS aware node replacing capability, so we have to implement it by ourselves.
Let’s do zone-aware deployment, not only Kafka but Zookeeper
Since Kafka0.10, we can do rack-aware replica assignments, thanks to Netflix team. In AWS deployments, we should map rack to AWS availability zone.
Of course not only Kafka :)