How to build an on-premise connection with RabbitMQ
In this blogpost i like to show you how you can build a on-premise connection with RabbitMQ to establish connection between a server and its clients.
In our scenario, our server resides behind a firewall or inside a cooperate network. So why do we want to use a message queue server and the implementation overhead for using it, when we simply can open a port in our firewall and set up a simple REST- or SOAP-server with our business logic. Simple answer: Security, company policies or just some sysadmin who doesn’t want to open this port or any other.
After we talked about the basic idea, i will give you some thoughts and ideas on how to migrate an existing API to the message queue concepts. We will talk about different possibilities and prebuild software which supports you in reaching your goal or solving your problem.
So why does a message queue server make our life a litte easier? Well, it doesn’t. In the first place at least. Before we can use our fancy RabbitMQ server, we need te set up an instance on a public reachable server. Luckily the guys from RabbitMQ have a simple installguide / tutorial to follow. You can find it here: RabbitMQ install.
While your RabbitMQ instance is installing one little question: How do we access the server if we can’t access the server?
Answer: We don’t. The server accesses the message queue and we also access the message queue. Or a little more technical. Our on-premise-server accesses a predefined or self-defined (from the server-implementation), but the name must be known by the client, queue; Let us call that queue Q1. To send messages to our server we just send a message to Q1. Our server pulls the message from the queue and answers to another queue; We name it Q2 (or Qn for n clients).
The question is now, how do we know which message comes from which client and why do we need Q2 if we have Q1?
The answer to the first question is user id. We generate on the clientside a unique string and pass it to the AMQP-message. Our possibly want to know, what it that AMQP and why do we need it. AMQP stands for “Advanced Message Queuing Protocol” and is standardized under ISO/IEC 19464. For our purposes it is enough to know, that it is the protocol over which our server and our clients talk to the RabbitMQ instance.
The second question a little more complicated. Technically we do not need the second queue, or later a queue per client. We could just append to the user id a flag for “processed by server” or “from server”, maybe we could you another field in the AMQP to identify which packet is from the server and for whom it is. At this point we slide into the next problem or technical issue: A message can be pulled by more than one client (client and server are both clients to the RabbitMQ-instance) at the time. The message resides inside the queue until one client acknowledges the message. Know some thought experiment. What happens if we have one billion clients and one server, so one billion and one clients to the RabbitMQ? Each client would pull each new message and checks if it is has the right identifier for the client/server. Now that would cause some serious performance problems and a ton of unnecessary comparisons.
The idea is now, we have our server queue Q1. Each client that wants a answer from our server, publishes its messages to Q1. The server can handle the packages one at a time (or if you start more threads, that amount of packages at a time). After the server processed a package, it sends the answer to a queue named after the client which send the message. This queue might be created at the time or is created through the server, it doesnt matter. Now our clients just have to check if there is a new message on there message queue pull it and process it.
Let use come back to our though experiment. If our one billion clients have all there own queue and terminate before they acknowledge there messages, what happens to the messages that are still inside the queues? Here we have to take some preconditions into the play. Do we want to enable the clients to get messages from a previous run? Can we allow a timelimit on messages inside a queue?
If our clients are allowed or need to access messages from a previous run, we simply let the messages inside the queue and risk a overflow of our RabbitMQ instance. If not, we can just clear the queue, if the client disconnects. This can be accomplished if the client created the queue and we configure the queue to kill the queue if the creator disconnects. Alternatively, we can purge the queue via a command from the clientside. If we allow a timelimit per message, the message is simply deleted after the message timeouts.
In conclusion this is the whole story behind a on-premise connection.
Migrate an existing API
What if we have an existing REST- or SOAP-API which now needs to support this on-premise connection? At this point we have three possible solutions:
- We rewrite our whole interface and make it natively work on-premise.
- We write a wrapper which communicates on the one side with our API and on the other side with the RabbitMQ-instance.
- We use an existing wrapper solution.
(1): How to write a native on-premise connection was described before. To make the API transformation a little easier here some advices:
If your old API used JSON as communcation format, you can use BSON very easily for your payload. BSON is like JSON but binary. This is important since you can only transport binary data inside the payload of an AMQP message. If you used XML (for example inside SOAP), you could use something like .NET Binary Format or BiM. Since i never had to work with binary XML, my suggestion could be totally misleading. The important part here is, that you can efficiently convert your existing data structure into a binary representation withour handling the de- and encoding by yourself.
If your old approach did not use HTTPS or any other security protocol, use one! Your new API will send data throught public network connections. Clearly no connection is totally save and there were some attempts on man-in-the-middle attacks on HTTPS, but any encryption or secure tunnel is better than plaintext (even if its binary encoded). The team around RabbitMQ has published a small tutorial how to enable TLS. You can find it here: RabbitMQ TLS Support.
If you do not want to enable TLS on your RabbitMQ-Instance or your want some additional security, you could implement HTTPS on payload-level. This has also another benifit. The HTTPS protocol on payload-level would exist between your clients and your server, the RabbitMQ-Server cannot look into the payload without breaking the encryption. Your own RabbitMQ-instance is not able, if hijacked, to perform a man-in-the-middle attack.
(2): This method is good, if you dont want or cant rewrite the code of your existing API. The idea is pretty straight forward. You build your on premise client like described earlier and glue a API client aside for the server side. For the client side you do the message queue stuff and glue a API server aside. The API server is just the endpoint definition of your existing api. This way allows you to be something like efficient when it comes to message conversion between API call and AMQP representation. This can be important if your old API also transports large binary objects (BLOBs). In this case you could suspend the sending of that BLOB until the server really needs it or chunk it and push it in this chunks through the message queue. Another idea is to not send the BLOB through the RabbitMQ but send a peer-to-peer connection id and send the BLOB directly through a P2P connection to the server.
If your API is not BLOB-heavy, you dont look that after efficient data transportation or you dont want to implement the whole API interface again, you could use the following method. Build an API endpoint that accepts everything. Then extract the call URL and the data, press it into a data structure that is easily parseable by BSON and send it through your on-premise connection.
(3): If you dont want to build your own on-premise connection or just dont have the time to do it, you could use something like Thinktecture RelayServer. The relayserver is a powerful version of what i described at the end of (2). It provides a admininterface where you can create new on-premise connections. The serverside configuration is done by a XML-file. Also the whole communication is secured via HTTPS by default, you just need a certificate. A guide how to import the ssl certificate can be found inside the GitHub Repository.
No matter which solution you use, the result should be the same. The will be some differences in efficiency and performance or usability, but that is always a matter of perspective.
Thank you for reading this post. If you have remarks or something to criticize, feel free to post it in the comments.