Scaling up your MOM based architecture with ActiveMQ and Network of Brokers

Youness Chara
7 min readOct 28, 2019

--

It’s already 9h30 when you show up from a dismal weekend (wasted in binge-watching La Casa De Papel). You didn’t even have time to find an empty chair in this crowded nosy open space when the IT-OPS guy catch you :

- IT-OPS: Nagios showed that the ActiveMQ cluster instances are down. It seems they didn’t restart automatically after the deployment of the security patch scheduled for last Friday. The manual restart procedure was applied, our DevOps trigged the Jenkins job for launching the Dockers, and both the master and slave are now up.

- You: Nice… hope we are safer now!

- IT-OPS: Just have a look in logs to double-check and ensure that dependent modules didn’t get hurt, have a good day.

- You: But what about hosts on Front-End VLAN?

- IT-OPS: NOPE… their patch is scheduled for the next Friday.

Here, you start feeling a cold sweat. You just realize that all business transactions for the whole weekend have been rolled back.

The Fright…

All web controllers were up and kept receiving requests. But as your brokers (both master and slave) were off, the controllers couldn’t transmit the messages to backend services. Of course, you did your homework and marked your controller’s endpoints as transactional. Thus, all requests involving backend services (like shipping service, email notification or your core banking system) returned the HTTP 503 status to your end-users ( indicating that a server is temporarily unable to handle the request).

In other circumstances, you wouldn’t care. After all, thanks to the transactional aspect, the client’s order state remains coherent with your backend state. Concerning your end-users, they are pretty used to service unavailability.

But as you know, at 10 o’clock, the weekly email report will be sent to all managers staff and it will contain a PDF attachment generated by your ELK stack. The report displays among other KPI, the business loss due to interrupted services. It was one of your manager’s brilliant ideas to sell the ELK stack as a quick-win alternative to full-fledged BI tool. Indeed, for each client purchase order, the BFF service logs the whole JSON payload (including the order amount ) along with the response status. You take then leverage of the greedy Elasticsearch engine to index all useful and useless data. And you use Kibana’s Timelion to draw (and export) a coarse visualization showing the total amount of both processed and unprocessed purchase orders. it’s now 9H35, and you have less than half an hour to remove this cursed KPI from the Kibana report and hope nobody notices it. Tic tac, tic tac…

OK, I am sure that you can easily figure out how to put all those business losses and customer turnover on the IT-OPS team; their DevOps is still a Docker apprentice and doesn’t master the -restart” flag, the IT-OPS patch planning wasn’t communicated…. But yet, you may still be wondering: what did you do wrong to end up with real business issues? Could you have avoided this?

Centralized Broker Architecture

Well, one of the most important features of MOM (Message-oriented middleware) based architecture is the guarantee of messages delivering. It means that consumers will ever end up receiving the messages destined for them

Standard MOM architecture is based on one central broker. Producers and Consumers must agree on the central broker location (and other technical stuff like destinations). The central broker is now the single point of failure of the system. It needs to be highly available and dynamically scalable.

To enhance the availability of the central broker, we could use a cluster of brokers. The common clustering strategy with ActiveMQ is the use of a collection of brokers with Master/Slave configuration and instruct JMS clients to connect to the JMS brokers using Failover Transport. JMS client will so connect to one broker from the cluster. Then if the broker is no more available, the client will auto-connect to the next broker. So far so good…

Not really! Broker clustering is a complex topic and the approach above works only for a subset of unavailability issues. It requires that one slave takes over the role of the master. Indeed, only the master can receive and dispatch messages. Therefore, the master unavailability must be globally synchronized (shared by clients and slaves) for the failover mechanism to work. And that is, actually hard to guarantee.

Anyway, clustering does mitigate the broker’s unavailability even if it doesn’t get rid of it. And soon or late, JMS consumers will receive their messages.

While the delivery assurance holds for both broker and consumer outage, it only applies for messages received (and acknowledged) by the broker.

Messages not received by the broker for any reason (connection failure, broker outage, security issue), aren’t taken care of natively. It’s up to their producer to deal with it. Generally, the producer has three options:

1. Keep retry: hold on and retry later and hope the broker is then available

2. Use a failover broker if is there any.

3. Rollback the current transaction and recursively bubble up the issue to its client, ultimately until the end-user and let him make his decision.

In our typical MOM schema, with a central broker cluster, you could implement the three options in order. But there will be always a probability of ending up in the chilling situation where a customer has to make its decision: come back later or look elsewhere.

Store and Forward

One common solution to enforce service continuity in the producer side is the use of the pattern store and forward, whereby messages are first stored in the local broker. Then, as soon as the central broker is available, messages are forwarded to him, and consumers thereafter.

The local broker can run in the same container (Docker, VM…) as the producer process. So that local availability is assured. With ActiveMQ, the local broker can even be embedded in the same JVM as the producer. Which is very suitable for performance, reliability, testing, and intra-application messaging.

While, in theory, you can always implement this solution manually (by adding an adapter layer between local and remote brokers or by using some integration framework like Apache Camel), it’s wiser to rely on your broker provider’s implementation.

ActiveMQ supports this messaging paradigm natively through the concept of “Network of Brokers” that connects different brokers to create complex network topologies.

When you use the ActiveMQ “Network of Brokers”, whenever a connection between the local and the remote broker is established, the remote broker will inform the local broker about all its active and durable subscriptions. Then the local broker will forward messages related to all remote destinations that have local matching.

Broker Network Topologies

“Store and Forward” paradigm can be extended to other topologies (all based on link between brokers) like:

  • Spoke and Hub: one central broker (HUB) for all local brokers, with duplex connection. It’s the most modular topology.
  • Concentrators: when you structure your brokers in layer with a decreasing number of brokers by layer. useful when you expose your service to a huge number of clients.
  • Mesh: when in a geographically remote brokers, you link each broker with direct neighbors

The Local broker configuration will give you a full control on which destination to exclude (or include by default) from the broker to broker flux. It allows also to specify if the connection to the remote broker is one-way or duplex. Duplex connection makes it possible to both send and listen to remote destinations. Another important configuration element is the maximum number of remote brokers a message can pass through before being discarded (NETWORKTTL). You can think of it as the deep of your network seen from the local remote perspective.

The “Network of brokers” is definitely the more relevant solution to scale up your MOM architecture horizontally, if you are using ActiveMQ. You can certainly implement the same topology using other broker providers. But with ActiveMQ you it’s pretty straightforward.

If your system is based on micro-services and you still use a centralized MOM, I encourage you to switch to the “Network of Brokers” architecture. You will, certainly, get done before the launch of next season of “La Casa De Papel” on Netflix.

In the next article, I will describe in detail, the implementation of “Hub and Spoke” topology in Karaz distributed BPM as event-driven architecture.

Stay tuned…

--

--