Kinesis vs SNS/SQS

i.e. first-class events vs ephemeral messages

John Gilbert
3 min readNov 13, 2019

I like to use Kinesis and SNS/SQS for entirely different purposes. There are plenty of posts available that compare and contrast the features, ilities and limits of these AWS services. You can somewhat use these services interchangeably. But I like to think of SNS/SQS as messaging services, whereas I think of Kinesis as a temporal persistence service. More specifically I think of SNS/SQS as ephemeral messaging and I think of Kinesis as a tool for creating and processing events as first-class citizens. I use SNS/SQS for intra-service messaging as needed and I use Kinesis for all inter-service communication as I discuss here and here.

All this is a matter of practice. As I mentioned, you can pretty much achieve the same messaging results with any of these services. But SNS and SQS delete messages once they have been acknowledged by a consumer, whereas Kinesis does not bother with acknowledgements and persists the records until they expire. This creates a totally different mental picture that is critical for creating event-first systems.

In event-first systems we treat events as first-class citizens. Autonomous services produces events as their state changes. These events are the ultimate source of truth. They live on forever in the event lake. We can replay them to seed new services and repair troubled services. We can use them as an audit trail and perform unforeseen historical analysis. In other words, events are not ephemeral.

I think using SNS/SQS within a service is great. For example, when I need an S3 notification I may hook it up to SNS then SQS then Lambda to publish a domain event to Kinesis to communicate the event to the system at large. But on the receiving side, here are a few more reasons why I prefer Kinesis for inter-service communication.

Stream processing is different because we have great control over the batch size. This allows us to optimize processing across related events. We can further optimize throughput with purposeful creation of partition keys. We can leverage the retry-until-expire nature to more easily self-heal and we alert on the increasing iterator age. This in turn forces us to design for idempotency. We can also utilize the stream-circuit-breaker pattern to set aside poison events by raising fault events, alerting on these faults and providing for later resubmission. This in turn forces us to design for order-tolerance. Idempotency and order-tolerance are crucial for maintaining data integrity in these eventually consistent systems.

If you want to do more than build an event-driven system, if you want to build an event-first system, then your events cannot be ephemeral messages. I will dive further into related topics, such stream topology, stream processing and the shape of events, in separate posts.

For more thoughts on serverless and cloud-native checkout the other posts in this series and my books: Software Architecture Patterns for Serverless Systems, Cloud Native Development Patterns and Best Practices and JavaScript Cloud Native Development Cookbook.

--

--

John Gilbert

Author, CTO, Full-Stack Cloud-Native Architect, Serverless-First Advocate