as seen by RubyOnRails Developers @ Selleo
It is surprising for me how long I’ve dealt without leveraging Pub/Sub in my toolbox. I’ve always associated Pub/Sub with services like Pusher and thought about it as a medium for communication between apps, e.g. in a microservice oriented architecture. I got a glimpse of event sourcing and Domain Driven Development before, but Pub/Sub on its own? Somehow it never occurred to me how useful this pattern is in the context of one single application. Up till the end of 2018.
What is Pub(lish)/Sub(scribe) pattern?
As Wikipedia says
“In software architecture, publish–subscribe is a messaging pattern where senders of messages, called publishers, do not program the messages to be sent directly to specific receivers, called subscribers, but instead categorize published messages into classes without knowledge of which subscribers, if any, there may be. Similarly, subscribers express interest in one or more classes and only receive messages that are of interest, without knowledge of which publishers, if any, there are.”
Even though it seems to be similar to the observer pattern, the fact where subscriptions are stored distinguishes it significantly.
Why should we use Pub/Sub?
Knowing more or less what Pub/Sub is, it is good to know what are its main benefits, and there are many of those. Just to name a few:
One of the simplest means to better implement SRP
SRP. Single Responsibility Principle. The first principle of SOLID. A very important one to prevent spaghetti-code buildup, yet also the one that sounds easy to achieve, but usually only on paper.
It might be simple to comprehend and apply in simple scenarios, but in large, production-grade applications with complex business logic it is a challenging one and requires applying techniques that may increase indirection significantly. This is because it is not always clear how much business logic steps/rules can we bundle together? Should we decompose a given class into smaller ones, or maybe it is not worth it and the class is fine as it is?
Pub/Sub addresses this issue in a way, in which you can assume most of the time that your class/object should just accomplish one thing and communicate it to the rest of the application — then subscribers will take care of all the side-effects of the thing you have just communicated.
Facilitates spaghetti-code reduction
Very much related to the one above, this reasoning is about decoupling project’s domains / modules. It counters coupling that is very common when using Service Object or Callbacks — both of which are very useful and powerful patterns but may become problematic if we decide to use them to integrate the behaviour of many different parts of the system. Such services or classes implementing callbacks become very dependent on the number of other classes that would normally be outside of their interest. This is especially annoying when TDDing, as it becomes hard to test objects in isolation, without mocking those dependencies first. Pub/Sub basically provides a way to remove all of those dependencies through a middle-man in the form of subscription.
Makes applications more modular
As a consequence of the above, if we keep our application’s responsibilities organised around some “domains” or “modules”, which, with little effort, is very instinctive in Pub/Sub, we can make it very easy to extract the given modules from the app into separate services.
The potential simplicity is based on the fact that we have a ready-to-use medium for communicating with modules extracted this way — and those are events/messages. Sometimes it might be necessary to change a bus that is used to propagate those events, but the interface (or event’s payload) in most cases can stay the same. This makes moving towards decentralised/asynchronous architecture much more natural and can drive an application to be much better prepared for potential scaling.
If we decide to use an external bus for conveying events, we can also leverage the fact that we can implement event handlers in a different technology/stack than our core one. This allows us to use the best tool for a given task with ease.
Renders breaking large tasks into smaller ones easier
A virtue every team leader will value is Pub/Sub making it easy to split large tasks into self-contained sub tasks. As publishers, events, subscriptions and event handlers can all be implemented separately upon agreeing on event’s payload, those can be described and handled as completely independent tasks. Due to the high isolation of individual components, testing them is also very convenient.
Makes logging significant events in the system easy
Server logs can be very useful for tracking problems or comprehending the state of the system at a given point in time. However in many cases, server logs are just a bunch of gibberish that does not provide information that is easy to browse or consume.
In Pub/Sub, events occurring in the system convey lots of comprehensible data that outline what is happening in the system. Plugging in into this stream of information is just a matter of creating a subscriber dedicated for logging and redirecting the before-mentioned stream to either file, STDOUT, database, CloudWatch, external logging service or any other destination that might prove itself useful in our case.
Observing such a stream of events can be beneficial in visually identifying anomalies that would otherwise be hidden in the abyss of regular server logs. Adding event tracing into the equation to indicate how events are correlated with each other can make it even more useful.
Simplifies data migrations
Last but not least, operating on events facilitates creating and applying data-fixes / data-migrations in systems that benefit from Pub/Sub. In many cases, changing a database state is just a matter of artificially generating and emitting a bunch of events — something that usually would be implemented by introducing scripts just mingling with data directly.
Using events for this purpose makes the process more comprehensible and readable. Also, all side-effects of such changes will be transparently applied as well, unless we explicitly unsubscribe from those events. This might be useful to handle some special cases, and is a powerful technique itself.
When to use Pub/Sub?
Pub/Sub is by no means a remedy for all problems, but there are situations in which it proves itself really useful. Applications in which it is the most beneficial can be characterised as:
- medium-sized to large-sized
- with easy-to-identify domains/modules that need to communicate with each other
- integrating with one or more external systems, especially if those integrations are broad
- huge, entangled monoliths in need of refactoring
- applications in need of modularisation for scaling or other purposes
The more characteristics shown above apply to a given application, the more benefits Pub/Sub will bring to the table.
When NOT to use Pub/Sub?
While Pub/Sub can be very favourable in lots of cases, there are cases in which it might turn to be unproductive. Solutions that should be considered as not the best-fit for applying this pattern share one or more of the following characteristics:
- small to medium sized
- with a scarce amount of domains/modules or ones that are difficult to identify
- not integrating with external systems
- already implementing some specific/dedicated, well thought-through architectural solutions that would neither benefit nor play nice with Pub/Sub
- prototypes that are just experimental, in which case Pub/Sub would just add unnecessary overhead
It does not mean that if your application shares some of the traits above then Pub/Sub will definitely not work in your case, although it might prove not to be worth the extra effort it needs to be introduced effectively.
A word on nomenclature
Before we go any further it is worth clarifying the nomenclature around the topic. Pub/sub as a pattern is usually described using terms like “publisher”, “subscriber”, “message”, “topic”, “message bus” etc. Still, in this article, I prefer to use different naming for those concepts to bring it a little bit closer to the DDD approach to building software.
And thus “messages” are referred to as “events”, “subscribers” are “event handlers”, “topics” become “domains” and the concept of “message bus / broker” turns to programmatically defined “subscriptions” (usually in pub/sub subscriptions are realised by the infrastructure itself).
BDD the Pub/Sub way
Incorporating a publish-subscribe pattern in a new or existing project requires some significant changes in mindset and also a dash of self-discipline. Not everything will feel natural at the beginning and the urge to do things “the old way” will be strong.
To expedite the introduction or transition to Pub/Sub approach it is good to have a set of guidelines one can follow when working with new requirements in the system. Following the behaviour-driven-development principles might be an example of such a rule-set. Some steps outlining this approach in the context of Pub/Sub are presented below.
Identify problem domains
First and foremost we need to get our heads around what domains/modules the problem we need to solve tackles. At this point, it might be necessary to introduce new domains in the system or reuse existing ones. Identifying those domains is not an easy task but it is an important one, as major changes to them later might be cumbersome to execute. Still, domains are usually just name(space)s. And as we already know
“There are only two hard things in Computer Science: cache invalidation and naming things.”
— Phil Karlton
I usually prefer to introduce names after adjectives or nouns that revolve around concepts related to the business domain or around types of the services I integrate the application with.
IOT… there is not an easy rule here — something that might be totally confusing in the context of one app can be crystal-clear in the context of another. Rule of a thumb might be — if a domain name is clear for your customer, and when using the name you feel that both of you are talking about the same thing, then it is not a bad name. If we know the right names, then it is time for…
Identifying events is another step we should not treat lightly — those will be used for communication between domains, and in some cases also within the same domain. Firstly, we need to review if any of the existing events can be used instead of introducing new ones. If we feel that we need to introduce a new one due to any reason, there are a few rules that we might take into account. If we cannot find the right event name or it sounds confusing or plain-silly, then we need to reconsider if the given context requires emitting an event at all.
Events are all about something that has just happened, so should usually be named after verbs in their past-tense form. There might be a temptation to derive event names from commands, i.e.
RequestEmailDelivery, yet those are no longer events — those are… commands. While those live really close to events, it is good to keep both concepts separate, as it might be really useful in the future. In our case,
EmailDeliveryRequested would fit the expected pattern better.
Event names should be unambiguous and, same as domain names, should be understandable by the customer. In fact domain names, event names and data they carry should become the new language we use to talk about the product and describe features. It would also be beneficial to maintain some sort of glossary clarifying the meaning of the names we use.
Some examples of valid event names are
PricingMismatchIdentified, etc. If the name accurately describes what has just happened, then it is probably a good event name.
Events should also be assigned to specific domains. Still, it might happen that events sharing the same name could be published by different domains — this is not a problem at all.
Identifying events’ payload
Event’s payload is just the data it holds that is relevant for, and can be used in all subscribers that are interested in the given handler (event handlers). The challenge is to identify how much data is good enough. Each event should be treated separately and the data it holds has to basically… make sense. Some antipatterns we should be aware of when planning for event payloads are:
- Including irrelevant data — usually happens when planning payload for subscribers and can result in a payload that is irrelevant in the context of the given event. Usually, this should be included in a different event or retrieved with different means (i.e. fetched from the database when handling event)
- Including too much data — similar to the one above — only data that makes sense to be included in the scope of a given event should be included
- Including too little data — especially if missing data is necessary to describe the context of the event in full
- Improperly nesting data — in particular when some data that should be nested is not. A good example is including all of the object attributes as separate fields of event payload instead of wrapping them in the attributes field.
There is also a recommendation not to include anything in the payload that is not actively used by all subscribers and just introduce more, fine-grained events. In my opinion, it might be a bit too extreme in most cases though.
Examples of some potential payloads
CustomerCreated: customer_id, customer_attributes
OrderNotificationSent: order_id, recipient_id, recipient_email
DoorOpened: door_id, keycard_uuid
NewOrderReceived: order_id, order_placed_at
PricingMismatchIdentified: product_id, local_price, remote_price
For sure examples above are not set in stone and are not the only right ones. Payload we define should always depend on the context in which a given event is broadcasted.
Time to TDD!
With all the conceptual work in place, we are finally ready to implement the solution. As usual, I do recommend following TDD to develop each part of the feature, especially because it is so convenient in the case of Pub/Sub.
This step is optional, but in case of critical paths of application, it can provide some useful safety-net. This is the only test that should be executed with subscribers actually subscribed to events and should focus on end-to-end behaviour instead of events themselves.
This does not necessarily have to be the first step, but in the end, we should implement a class / object that will act as a source of events (unless it is already there and we are going to reuse existing events). This is a publisher who decides if and what events should be broadcasted. Therefore in tests, we should focus on both core responsibilities of given class and events emission as well. At this point we should also check if events emitted are populated with the correct payload.
It is worth mentioning that each unique event should preferably be emitted just from one place. Not only this will make it easier to comprehend the application flow later on, but in most, if not all cases, it just feels right.
This is another step which is optional, but sometimes it might prove itself useful. When testing event classes we should focus on the payload itself. If some fields of the payload are obligatory (rarely one is not) or need to be populated with a value of a specific type, this might be tested here. On one hand in some solutions events do not have dedicated classes, and therefore cannot be TDDied, on the other, dedicated classes do significantly increase the visibility of what events are available in the system.
Subscription is “the glue” between the publisher and the subscriber — therefore we need to ensure it is in place. Tests here are simple and just focus on the fact whether each subscriber listens for the proper range of events.
One of the recommended approaches to organising subscriptions is to listen for events globally, instead of in some specific context. This allows us to see all subscriptions in one place and quickly get a grasp of how the whole thing works. Also this way we prevent coupling of different problem domains which might be introduced by on-spot subscriptions.
In this place, we also usually decide if a given event should be handled synchronously, asynchronously or in some other, specific way. This aspect should be covered by tests as well.
Another optional assertion would be to ensure if both event and event handler classes exist, which would add more “integration” value to such otherwise very lightweight test scope. This would also drive further implementation if we decide we want to start by implementing the subscription part first.
On many occasions subscription is something we just forget about and something that might not be an obvious bug during the manual testing phase. This is especially true when an event is already emitted, so it just feels right to write a handler for it… and forget about “the glue”. It is usually beneficial to introduce some kind of linter that will verify if any of the handlers we have introduced is not just dangling in a void without anything that can actually call it.
Subscribers (also referred to as “event handlers”) are classes that implement logic that is executed in reaction to event emission. Such logic can be executed directly in an event handler or can be delegated further to some more specialised entities like service objects. It is worth mentioning that subscribers can act as publishers as well because nothing stops us from emitting more events within an event handler. Also, it is up to the subscriber to decide if it wants to further process a given event, even though it was subscribed to in the first place. All of those responsibilities should be driven by adequate testing.
Controlling flow with events
One thing that might be confusing at the beginning is a slight reduction in application flow visibility. We just no longer see references to all pieces taking part in a given process in one place. This is only partially true because this information is visible, but in a different form. You can pretty efficiently recon of what is happening in the application by just looking at what events and event handlers you have — a short look at the files tree can tell a lot. You can achieve similar results by investigating subscriptions alone as well.
When it comes to the actual flow, it is best to not assume any particular order of events’ processing. Sometimes selected event handlers do depend on other event handlers in terms of the order of execution though. So what are the options of controlling the flow of execution if we really need to ensure that given steps are realised (events are processed) in a certain order?
Bundle event handlers together
Sometimes it may occur that event handlers can be bundled together and be treated as one. This is only true if both handlers do belong to the same domain.
Let’s say after an order is created then a message should be sent to the customer first and another message should be sent to the shop manager only if the sending of the message to the customer succeeded. In such a case, you might want to introduce one handler in the “messaging” domain that will subscribe to the “order created” event and will ultimately result in emitting two new events: “new order notification sent to customer” and “new order notification sent to shop manager”. In the example below, sending messages could be handled by the service object, composite or event handler itself. Such an event handler can be executed synchronously or asynchronously as it does not make any difference in this case.
Process events synchronously
This is a very straightforward approach — there is no specialised medium to propagate events as those are just handled in-process and synchronously. To ensure the specific order of handling the behaviour that is communicated by events we need to secure two things — the proper order of emitting events and proper order of subscribing handlers to those events.
While this approach is the easiest one to implement, it can be prone to introducing bugs, especially if not thoroughly tested. For instance, let’s assume that somebody sorted our subscriptions alphabetically — it does not seem to be a problem at a time, but can cause some discrete problems and inefficiencies later at runtime that might be hard to track. In the example below we want to ensure that a customer is created in CRM service only after we confirm that we have sent him a message successfully.
Process events asynchronously
The asynchronous approach is necessary when some event handlers need more time to process the event, which in turn can significantly affect the overall performance if done synchronously. The solution is to delegate events handling to some asynchronous processes, i.e. queue.
To ensure the correct order of processing we need to take care of confirming that a given job was processed successfully, handling errors and retries, maintaining performance not to clog the queue etc. Therefore this measure should be applied only if no other option can be introduced as it leads to a significant amount of complexity.
Chain of events
The solution that does not share any problems of the approaches presented above is events chaining. Handlers can be executed synchronously or asynchronously without using any additional patterns. If we need to organise a process around some specific order of actions we just need to emit separate events in the event handlers that need to be executed prior to other event handlers. Then the latter should listen for those events to trigger their own behaviour. It is like “hey, I’ll let you know when I am done with my thing so you can do your own”.
In most cases using chained events is the best course of action. It might be challenging to make it work if triggering one event handler depends on more than one other event handler/event to be executed/propagated. There are different approaches to this problem based on retrying event handlers until some condition is met or aggregating events and emitting new ones as a result but many times it is good enough to just rethink the whole problem and organise the process in a different way.
I have already mentioned a concept of a medium which is used to transport events to propagate them in the system — let’s refer to it as an “events bus”. There are at least four kinds of media that are worth mentioning.
- Synchronous — in-process, handled fully synchronously — can be effective when we prefer the order of running event handlers to be preserved and when event handlers are lightweight. Also great as a starting point when pub/sub is used as a means for refactoring large, legacy applications.
- Asynchronous — handled in a separate process/thread within the same environment — non-blocking, usually based on some sort of queue (i.e. Faktory or Sidekiq). Can benefit from all features offered by the queue, like statistics, logging, retries, fine-grained error handling etc. Should be considered as the first choice.
- External — based on separate service like an external message broker and/or notification service (i.e. AWS SQS / AWS SNS, RabbitMQ). A step toward a solution working in microservice oriented architecture. Requires additional integration and may introduce a small performance overhead, so it should be introduced only when necessary. Still, it can be used as a secondary event bus within an application for handling some special cases / integrations.
- Logger — a simple medium when it is the user who acts as a subscriber of events. Such events data can be stored in a file, redirected to STDOUT or even some external service like Cloudwatch for further investigation. In some emergencies, such persisted events stores can even be used to restore or revert the state of the system.
It is not uncommon for applications to take advantage of a few if not all kinds of event buses presented above, yet every decision to introduce a new one to the system should be backed by some solid reasoning. This is because choosing one or many specific media of propagating events may introduce numerous consequences.
For instance some media guarantee at-least-once-delivery so the concept of idempotency should be taken into account to handle events delivered more than once. Also, some (especially external) media can be temporarily unavailable, therefore planning for error handling to ensure eventual consistency seems to be critical.
Special uses of Pub/Sub
There are some cases in which Pub/Sub turns to be really useful while not being at a core of handling cross-domain communication within the application.
Large, monolithic, legacy applications with low-quality codebase can be really challenging to work with in the first place. This is especially true for developers that are new to the project, i.e. when taking over a project some other provider developed. In many cases of this kind the code is not tested properly and is very fragile. Therefore, introducing new functionalities to such house of cards is susceptible to destabilising the entire thing. Using Pub/Sub has the benefit of decoupling new functionalities from the existing codebase.
Existing code can be augmented with event emission capabilities which should not affect the original logic at all, while the new behaviour, isolated and properly tested, can be handled within event handlers. Introducing Pub/Sub can also be used to decouple tightly-bound parts of the system, especially if the coupling is introduced by callbacks pattern.
Introducing auditing/logging of events can be a step that precedes refactoring a monolith. The idea is to identify critical events happening in the whole system and add event emission to each such place found. This way we can easily investigate what is happening in the system by just reviewing production event logs. This can also act as a foundation for further refactoring or introducing new features.
Costs and risks of introducing Pub/Sub
Even though introducing Pub/Sub brings lots of benefits to the table, it comes at some costs and has some consequences. To name the most significant ones:
Requires mindset change
Kind of obvious “problem” but one that might be surprisingly difficult to handle. The change in how the problems are modelled and the necessity to keep proper naming of events, handlers and payload attributes can be hard to get used to for some team members and usually requires a buy-in from the whole team. Fortunately, after a while, the development process involving Pub/Sub becomes very natural.
Already mentioned when discussing ways to control the flow of the events — it is no longer explicitly defined in code what other code would be executed in a certain context. At least not in the context itself. Still, those correlations can be seen by investigating subscriptions and/or defined event handlers. Some, including me, may even find it more convenient.
It is possible to register event handlers just where the event is emitted to further increase the visibility of what is going on, but personally, I am not a fan of this approach, as it further expands responsibilities of the class publishing the event. “The glue” is all over the place.
Inflexibilities in changing payload structure
Rarely a problem if we keep the payload right in the first place, but subsequent changes in the event’s payload can require significant effort to update all event consumers / handlers. Adding a field does not pose a problem but removing/renaming one or changing the type of value it stores may be problematic. The problem might be even more discrete if we use an external bus to propagate events outside of the system. This is also why it is recommended to plan events and their payloads carefully and to maintain an adequate set of both unit and integration tests.
Pub/Sub and Ruby on Rails
Ruby plays an important role in the company I work for, therefore examples provided are written in ruby, using wisper gem.
Implementing Pub/Sub using Wisper gem
The context of the task is booking hotel rooms that are supplied with keycode-based locks. Whenever a hotel’s property management system registers a new reservation, we want to generate a code for a given room (or rooms). This part works fine, but the problem is that we would like to update another piece of software related to managing guests (GRM — Guests Relationship Management) with the keycode generated, for instance for support reasons.
The problem domains / scopes we can identify in this scenario are: reservations, digital locks and GRM. What it seems we are interested in, is the moment when a new keycode is generated for a lock (possibly in the context of reservation). So the potential event would be something like
Locks::CodeGenerated and will hold the actual keycode and reservation identifier in its payload.
The code presented below is a sort of over-simplification and takes advantage of a few helpers and base classes that were not included. This was done on purpose to focus only on the actual business value of the solution.
If you are interested in a high level, opinionated wrapper around the concept of publish/subscribe pattern in Ruby on Rails, I do recommend having a look at pubsub_on_rails. It is a robust library answering most of the every-day programmer needs when working with pub/sub and has pretty comprehensive documentation including examples of testing each part of the flow. What is more, it is battle tested on a project that has already processed nearly 40M events.
Other recommendations when using Pub/Sub in RoR
There are a few rules we follow when implementing Pub/Sub using Ruby on Rails framework. These are by no means limited to Pub/Sub but in its context, we find those principles especially beneficial.
- Use “bang!” methods a lot and let stuff fail in runtime the correct way. This is especially useful for tracking problems when payloads of events are not validated and we forget to provide some identifiers when emitting those events. Adhering to this rule benefits the most in the context of ActiveRecord finders.
- Use external system ids, if those are unique — as Pub/Sub excels for integrating many external systems it is useful to use identifiers provided by those external systems even as primary keys of objects we persist locally. Further using them in event payloads will undoubtedly make it easier to audit events that were broadcasted in the system.
- Consider maintaining a clear separation of domains — regardless of the technique used, be it dependency injection, decorators or even CBRA / Packwerk, try to introduce interface segregation for models/objects that might share the same database table but are used in different domain contexts. Asking yourself the following question: “If I remove this whole domain from the application, will it still work?” can help you decide on whether it needs more isolation.
Publish/Subscribe pattern is often confused with event sourcing or DDD. This is by no means the same thing. Yes, events are very important in both concepts but ES/DDD/CQRS is so much more that it is even not worth continuing this topic here. Pub/Sub may be treated as a natural step towards ES without introducing the whole shebang, that requires even greater changes in mindset.
This being said, Pub/Sub can be an extremely useful tool for building flexible and decoupled architectures that are subject for scaling. It also facilitates planning development by making it easier to decompose large problems into smaller tasks and promotes a unified language for communication between the product owner and the dev team. Due to this and various other reasons I cannot recommend Pub/Sub highly enough to be investigated as a new element in your application domain context. Also, make sure to check out pubsub_on_rails — great starting point for introducing Pub/Sub into your app.