kafka vs rabbitmq vs sqs
On the consumer side Kafka doesn’t push messages to consumers. Instead it’s the consumers who pulled messages from Kafka. In order to improve efficiency message are not pulled one by one but in (mini) batches. It is not possible to acknowledge individual messages. E.g. every few seconds the consumer polls for any messages published after a given offset. Kafka returns the batch of corresponding messages. Once all the messages have been processed, the consumer confirms the batch to Kafka and the offset for the batch is committed in Kafka and the next poll request will return the next messages. Now assume that the consumer fails in the middle of the batch (i.e. some messages have been processed and some others haven’t). When the consumer recovers it will start from the beginning of the exact same batch, reprocessing some messages that were already processed.
In SQS, the service must be polled for messages with an optional timeout. If no timeout is specified, the polling will result in no messages being consumed. A maximum 20 second timeout allows the client to poll and wait up to 20 seconds for a message before disconnecting.
RabbitMQ on the other hand supports blocking connections enabling a client to simply sit and wait for a message to be available without the need to poll. In many cases, this is a more standard and familiar approach to consuming messages queues and it’s compatible with other messaging frameworks like Celery. Additionally, RabbitMQ can selectively consume messages based on topics, providing the opportunity to create robust message processing schemes.
Kafka is used when you want to handle extensive streams of data from the producer side. This solution is more producer-centric unlike others. In Kafka, each consumer maintains an offset of the messages it has already consumed. In case of failure, the subscriber can then restart processing from where it left off. Offsets management is also an important design decision as it affects the delivery guarantees. These offsets are managed by Kafka in Zookeeper. Consumer has the option to rewind/skip to the desired offset of a topic at any time and read all the subsequent messages. Once the message at the specific offset is processed, it’s consumer’s responsibility to whether delete the message by marking offset as empty or leave it there for consumption later.
SQS uses visibility timeout for message acknowledgement. Whenever a client begins consuming a message from the queue, a clock is started. Once a set time has passed, the message is automatically re-queued unless it’s been deleted by the client. While this does work in theory, it requires an extra step for any messages that may take considerable time to consume: resetting the visibility timeout. This might result in message duplication. This guarantees “atleast-once delivery” unlike SQS FIFO queue which is “exactly once delivery” message broker solution by AWS.
RabbitMQ has two modes of message acknowledgement:
noack. When using
noack, messages are automatically acknowledged as soon as they’re consumed. If the consumer fails to actually consume the message, it will not be re-queued. If
ack is used, the client must acknowledge the message was consumed, otherwise it will be re-queued automatically once the worker is disconnected. This means that a worker that does not acknowledge a message but remains connected for an extended period of time will prevent the message from being re-requeued. This also ensures “atleast-once delivery”.
Kafka can maintain a message throughput rate of 100k+ msgs/sec which is significantly higher than most other alternatives.
Amazon places a limit on the message rate which can be increased on request and also charges extra based on the overall message rate.
With a message rate of about 20k+ msgs/sec which is much less than Kafka, it’s sufficient enough for most use cases. Due to it’s platform-independent framework and ease of use, RabbitMQ is a very mature message broker offering with a lot of extended support from the community.
Apache Kafka requires that you host and manage the framework. That means you are responsible for picking the right compute resources and storage capabilities, getting involved in capacity planning, and managing failure detection and recovery.
You can get started with Amazon SQS for free. All customers can make 1 million Amazon SQS requests for free each month. Some applications might be able to operate within this Free Tier limit.
In RabbitMQ, if the queue receives messages at a faster rate than it can pump out to consumers then things get slower. As the queue grows, it will require more memory. Additionally, if a queue receives a spike of publications, then the queue must spend time dealing with those publications, which takes CPU time away from sending existing messages out to consumers: a queue of a million messages will be able to be drained out to ready consumers at a much higher rate if there are no publications arriving at the queue to distract it. Not exactly rocket science, but worth remembering that publications arriving at a queue can reduce the rate at which the queue drives its consumers.