An Honest Review of AWS Managed Apache Kafka: Amazon MSK

UPDATED ARTICLE

This article was written when Amazon MSK was released in beta form and since then MSK has tremendously improved.

I have created an online course in partnership with AWS to help you learn it and I’ve updated this blog as Amazon MSK is now production-ready.

Why you would want to use Amazon MSK

Amazon MSK is one of the best way to deploy Apache Kafka in your AWS VPC securely and quickly. The main advantages that you will get are

  • Managed service: you don’t have to bring together an entire engineering team together to setup Apache Kafka. You can start building your applications in less than 15 minutes
  • Network security: Apache Kafka on Amazon MSK is deployed within your VPC, meaning that Apache Kafka network packets never goes out on the internet. This is a big difference from public managed solutions such as Confluent Cloud.
  • Kafka security: MSK supports SSL based security and SASL/SCRAM. I’ve setup Kafka security before, and I can tell you it’s error prone and hard. You can directly use a secure Kafka cluster on MSK.

you do not pay for Apache Kafka replication traffic going across your AZ

  • Cost savings: one HUGE advantage of using Amazon MSK is that you do not pay for Kafka replication traffic going across your AZ. If you are going to run Apache Kafka on EC2 machines yourself, and set a replication factor of 3, the network bill can become pretty significant if you have huge data volumes. There’s a handy spreadsheet here to compute your potential cost savings when using MSK
  • Managed upgrades: one simple API to upgrade your Kafka cluster with no downtime.

The rest is a full Apache Kafka experience. You can still customize settings if you’re an advanced user, it is using the standard Apache Kafka distribution and therefore all your Kafka Streams, Kafka Connect, or any Kafka applications will still work.

Amazon did create some nice AWS service integrations with MSK:

  • You can use the Glue ETL service to run a managed Apache Spark job directly connected to your Amazon MSK cluster
  • You can use the Kinesis Data Analytics service to run a managed Apache Flink job directly connected to your Amazon MSK cluster
  • You can use Lambda functions (!) to create Kafka consumers and react to data flowing through your Kafka topics. Very, very neat
  • AWS Certificate Manager is be used to provide SSL certificates for your clients.

Conclusion

Amazon MSK is now a very good solution to implement Apache Kafka on AWS. I am recommending it to my clients for its ease of use.

If you want to learn Amazon MSK, I’ve created a 6 hours long course and as a thank you for reading this article, use the coupon code MEDIUM15to get a 15% discount at checkout ✌️

And if you want to learn how Apache Kafka works, my other tutorials at https://kafka-tutorials.com should help! Happy learning :)

Udemy Instructor, 5x AWS Certified, Kafka Evangelist, New Tech Hunter https://courses.datacumulus.com/