PubSub in web: comparison and deep-dive into Bayeux/Faye
Pub(lisher) / Sub(scriber) communication means that we have one side that tells that something (an event) has happened, and other sides (listeners) that react to that event.
To understand the usage of this concept better, let’s imagine that we are building a dating app (Dinder). We need to have a web page that shows a ‘You have a new match!’ page when a match is found for our dear single. The match searching is happening on our servers, and we need to notify our app’s users in real time that a potential partner has been found.
In order to achieve that, we need to use the Pub/Sub technology, our server being the publisher (“Hey, I found someone for you!”), and our app/website being the subscriber (“Tell me when you find someone so that I can show the relevant screen to the user!”).
There are several paths we could choose while implementing this idea. Let’s overview and compare them:
Polling
Regular polling (client makes request every X seconds to the server, asking “Is there a match? Is there a match now? And now, is there a match now?” 👈
Benefits of using Polling:
- Would work on almost anything
Drawbacks of using Polling:
- Exhausts the network
- Not real-time data
New Push Notifications API
If our client (subscriber) is a website, we could use the web push notifications API introduced quite recently. This would definitely be an overkill for simple event-based communication with the server, but it is possible. Using that technology, our client would be able to subscribe to certain types of notifications, and our publishes (server) would be able to send them.
Benefits:
- Real time data subscription and updates
- You can say “WoHoo, I’m using shiny new stuff!”
Drawbacks:
- This technology requires service-workers, which are not supported (or disabled on purpose) on many weak devices, mobile and legacy browsers
- Requires user to approve the annoying ‘This page would like to send you push notifications’ pop-up, which is mostly declined or ignored by the users
Long Polling
Long polling is a polling technique that drastically reduces the amount of network overhead by keeping the requests alive until the publisher returns a response. If the request times out, a new one will be initiated by the consumer and sent to the publishers, who will hold that request and not release it until new information is available.
Benefits:
- Provides almost real-time data to the subscriber
- Is based on basic AJAX requests, making it widely supported
Drawbacks:
- Information is not really event-based by nature (though most frameworks are wrapping the base long-polling functionality in an event-like manner)
- May require implementation that is a bit more complex when in a multi-threaded, multi-process environment (Will discuss this further in this article)
Web Sockets
We can also choose to implement event-based pub/sub communication using websockets — which allow us to connect to a remote socket, and then publish/subscribe to events in real time.
Benefits:
- Real time data transportation
- Very easy and straightforward to implement
- Widely supported by backend servers and APIs
Drawbacks:
- Not supported on legacy browsers and devices
- Disabled on purpose on some embedded browsers to reduce the loading time, memory consumption and bundle size.
Keeping in mind these 4 options, I would like to introduce you to the Bayeux protocol:
The Bayeux Protocol
Bayeux is a protocol for transporting messages primarily over http.
“Primarily” is an important word, since Bayeux supports other protocols as well, while fallbacking to basic HTTP Ajax requests in order to provide backwards-compatibility with legacy browsers (i.e embedded browsers inside SMS windows in old Android systems 😡).
So, Bayeux is a protocol. There must be something implementing it, right? right. Some implementations include: CometD (mostly for Java programming) and Faye (multilanguage implementation).
The implementations provide a very simple and reliable API that supports Pub/Sub capabilities between several entities (server2server, server2multiple clients, clients2server, etc).
The interesting bit is that once you add bayeux-based communication to your project, you are pretty safe to say that you have proper support for pub/sub functionality, as most implementations of the protocol include fallback mechanisms if the newest technology is not available on the current consumer.
Now that we understand the need, the concept and the solution, let’s jump into the implementation.
Faye
Faye is a great implementation of the bayeux protocol. Its base support is for NodeJS, Ruby and client-JS.
The implementation & documentation are available here: https://faye.jcoglan.com/
While the open source is available here: https://github.com/faye/faye
Right, let’s start with our backend.
The official documentation has guides for pretty straightforward scenarios. I would like to cover a scenario when adding Faye to an existing RubOnRails backend (And not plain Ruby/Sinatra/NodeJS backends).
Assume we have a Rails client-facing API, which all the clients (mobile devices and web-pages) call in order to retrieve data.
Since all the Faye-based communication will go through a certain endpoint (domain.com/faye/…), it would make sense to add a Rails-Faye middleware.
Lets do that by adding the following dependencies to our project:
Pay attention to these unusual gems that we added:
- eventmachine — a simple event-processing library for Ruby
- faye-redis — a library used to synchronise events on the backend using Redis in the middle. More on that — later.
Great. Let’s get on with it.
As you remember, we already have an existing Rails app in our example. And in order to handle all Faye requests — we will use a Rails Faye Middleware (Middleware — an entity that will either handle or ignore each and every request, and then either pass it to the next handler or stop the bubbling).
Let’s add that middleware with the default config:
This will mount our middleware to the /faye endpoint, meaning all requests to that endpoint will be passed through our FayeRails::Middleware.
In addition to that, this also means that our client-side can connect to the following endpoint and get the same version of Faye client-side javascript library: domain.com/faye/client.js
Speaking of our client.. Let’s set it up as well. First, let’s add the Faye dependency.
Great. As you can see, our HTML file requires the Faye library from the /faye endpoint hosted by our Rails app. This is great for avoiding versions mismatches.
Now let’s add our JS code that subscribes to a certain topic (i.e — ‘foo’ topic, or, more realistically, ‘clientId:matches’ topic):
Brilliant! as you can see, we have subscribed to the events from that topic and will react to them by alerting an object containing their content.
So here is what we have taken care of:
- Dependencies installation on our backend
- Setting up Faye middleware on the backend
- Consuming dependencies from the frontend
- Subscribing to a topic
- Handling a message in that topic
Now, it is time to publish an event on our backend side using EventMachine:
Awesome, our server has just published the first event, and if we completed all the previous steps — we should have seen that event alerted on our web-page.
So, happy flow done, it works, but it’s time to understand the magic that is going on.
If we think about that, whenever a client makes a request to the server, a thread or a forked process are handling that request, right?
So, we have a thread T1 handling our request (holding the client’s connection alive until any news come in)
Now, let’s think about our publisher. Assume we have some background worker on a separate server running advanced computations to find a match for our user. The computation completed and a match has been found. Now, the background worker needs to inform our main API that a match was found, who, in turn, needs to update the end-client. But what if we have several instances of the client-facing API?
Our load balancer forwarded the client’s request to instance 1, which has Thread1 that is holding the client’s request, but our background-worker was sent to instance 2 on a completely separate machine. Considering this, how will our client be informed??
By default, Faye publishes events to the current machine’s memory. Meaning, an event that was published on instance-2 will be private to that instance and available only on its memory. Instance 1 will have absolutely no idea about that event.
So, we are missing something. A synching mechanism. A central storage for events, to which all publishers will publish and from which all listeners will listen.
That is exactly why we have added the ‘faye-redis’ dependency.
We will be using Redis, an in-memory database, to store our systems’ events.
With that central synchronisation entity, our architecture will now look like this:
Our Redis is the central storage for all events, which helps us tackle the multi-instance, multi-threaded environments issue.
BackgroundWorker → FayeAPI on Instance 2 → REDIS ← FayeAPI on Instance 1 ← Client consumer
Awesome, that will work. The only change that we need to do on our backend, is add Redis as the engine of Faye. This is achieved by modifying our application.rb file:
And that is it! Now our app will receive the ‘Match Found’ event even if we have multiple instances and threads.
You can also add custom controllers, reactors, listeners,… to your backend code. You can subscribe to events on servers, not only clients. You can implement this stack using completely different languages.
All of that — I am leaving to your imagination and needs. Just make sure you understand the magic behind 🧙👀
Thank you!
Lev Perlman is an Engineering Manager and Education Entrepreneur, working to empower underrepresented and underserved people around the world.
To find out more, ask WhoIsMrPerlman? ️🙏❤️
See you next time! 👋