How to choose a Message Queue
I’m working in a team which is developing a mail server: James. This project is a subpart of the OpenPaaS project, involving multiple teams.
In this mail server we need to implement a distributed mail queue. A mail queue is a mandatory component of SMTP servers. It allows uncoupling receiving messages from their processing. The current implementation relies on an embedded ActiveMQ server with decades old JMS implementation. Time had come for a little lift-up!
But a mail queue is a complex system… Not only should it be an efficient work queue, but also many additionnal features should be implemented:
- Priority: You might want to give a higher priority to your organisation email, compared to spam
- Delays: Maybe you don’t want to send too much mails at once. Maybe you need to wait a bit before re-sending an email to a remote mail server in case of errors…
- Management: Mail server administrators expects to inspect the content of the mail queue, remove the elements they want, amongst other…
We could not implement it in a straightforward, and production grade fashion on top of RabbitMQ. Thus we decided to look for alternative solutions and message queues.
Every OpenPaaS team being using message queues, we had to choose the one that better fits people needs.
The first step was to learn about the requirements of each team, then to summarize them.
In order to perform this, we decided to interview each team leaders. We defined a list of topic to discuss:
- features they implement, and will implement on top of a message queue
- limitations they currently encounter
- experience with this message queue technology
We digged quite far, and discussed some hot topics:
- at least once Vs at most once
- availability Vs consistency
- management capabilites: queue size, browse, …
As a conclusion, OpenPaaS is abstracting the message queue technology behind a messaging API. This API allows one to do publish/subscribe to specific topics, to perform broadcasts and to share work using work queues.
OpenPaaS also uses the message queue to allow some services to communicate between them. This includes information:
- from the Mail server, James
- from the calendar system, Sabre.
- from the contacts service
Most of our use cases are relying on at least one delivery, but some of them would benefit from an exactly once semantic. We tend to favor consistency, and want as much management capabilities as we can.
This allows us to extract core requirements:
- we need basic messaging capabilities
- we need Consistency, at least one semantic
- as a bonus, advanced management capabilities
But we also added some criteria which were not subjected to the interviews:
- maturity of the project
- performances, …
Hopfully, after running the interviews, there were no contradictions such as one team needs availability and an other one consistency. Those interviews helped us a lot in order to exclude some implementations from our final choice.
There are so many message queues implementations that we decided to limit the number of candidates. Here is the list of the selected for study:
I will present here the ones that catched our attention the most:
RabbitMQ is the message queue currently being used by OpenPaaS, so no migration would be necessary. It offers a good and mature community. Some problems regarding clustering have been reported, including message loss and manual reconciliation upon partition. The bad point of this solution is that it doesn’t fit the advanced management features needed by James. We selected it for further investigation, and the quality of its documentation have been a decisive factor.
Kafka is a cutting edge streaming platform. It fulfill the requested features. As it exposes a distributed log, some features of a mail queue are easier to implement. Its community is strong and mature, and it is thought to clustering as a primary concern. Replay is a core concept. However, it’s architecture is complex and involve a ZooKeeper quorum. We selected it as well.
RocketMQ is a promissing, newly born Apache project. However, despite good performances, and an impressive feature set, the community is not very mature, mostly centered around the Alibaba company. The project is still under development. So we concidered despite all of its advantages choosing it would not be a wise choice.
Artemis, is the HornetQ, donated to the Apache foundation, and adopted by the ActiveMQ project. A rock-solid message queue, but sadly clustering it is hard. Some old school techonogies (including XML!) are involved. Thus we decided to not investigate further.
NSQ is decentralized messaging system. The messaging patterns we want are supported, but only a hack allows to gain *durability*. Clustering is a primary citizen concept, but at the cost of features. AMQP is not supported, replay neither. We decided it was not worth paying the cost of a migration to it.
And then we add to choose
The war was raging between Kafka and RabbitMQ.
None of the above presented technologies allows to fulfill James needs in a satisfying manner, while fitting our production standards. Thus we decided to exclude James features from the choice. The mail queue advanced capabilities will be implemented with a message queue/database combinaison.
Our strategy then became exploring with a POC the limitations regarding RabbitMQ use.
- We provided a POC of OpenPaaS use cases on top of RabbitMQ
- We conducted experiments on top of a dockerized rabbitMQ cluster: stoping nodes, producing and consuming on differents nodes, declaring exchanges and queues on different nodes.
- We tested durability, production/consumption order.
A strategic meeting will then tell if we would like to perform the move to Kafka.