What do you need to know about #PubSub in order to appreciate the feature called ‘listening’!?

Srushtika Neelakantam
4 min readOct 17, 2017

--

If you have been in the #realtime space for a while, by now you probably know that the pub/sub message architecture is a major player in this league. However, if you are like me who did not know until a very recent past that pub/sub has uses two main ways to filter the data, this post is for you. We shall understand these two types and see how a feature called ‘listening’ offered by #deepstream is a perfect out-of-the-box innovation to solve a problem that you never knew existed.

I shall focus this post for both newbies and experienced folks. So, here we go!

PubSub, a shorthand for Publish-Subscribe message architecture, involves a publisher who is responsible for publishing certain data and a subscriber(usually the client) who shows an interest and thus receives this data from a publisher. In between these two, there’s an additional entity responsible to route the right data to the right subscriber. Not all the data published goes to all the subscribers. Think of this entity as the newspaper boy who gets all the published newspapers and sorts them before giving the right newspaper to the right household(the subscribers).

This entity could be the server itself or an additional layer between the publisher and the server. This additional helper is called a ‘data provider’ since it is responsible to provide only the required data to the server, which in turn returns it to a client. Of course, building such a data provider will involve dealing with huge complexity and logic. If it is the server itself is serving as this middle-man, imagine the huge amount of data it deals with in order to send the correct data to a subscriber while on one side it is continuously being swamped with all the data by the publisher(s). Just keep this problem in some corner of your mind now, we’ll come back to this at the end.

If wikipedia is to be believed, the filtering job done by this intermediate entity can be done in the following two ways:

1.Topic-based PubSub — Publisher is the king

In terms of topic-based filtering, the publisher decides to have certain channels and publishes different data to different channels. The subscribers have a choice of subscribing to one or more of these channels only, in which case they will receive the corresponding data from the subscribers.

It’s like a menu card in a restaurant, you can choose only from the available items. So, if I’m a subscriber interested in weather data, which is being published by someone. Assume that this publisher has a separate channel for each continent in the world but I’m interested in only the weather for Germany, or to be even more precise, just he weather for Berlin. Well that doesn’t matter because such a channel doesn’t exist and I’ll have to deal with the whole weather data for Europe.

So much of unnecessary data flow, tsk tsk tsk.

2.Content-based PubSub — Subscriber is the king? not really though

Now in content-based filtering, the subscriber chooses to receive custom data of his choice which is not limited to some channels offered by the publisher. In fact, in this case the publisher doesn’t really have such channels. The publisher sends out ALL the data he has and this middle person now has to pick out the data required by the subscriber and send it accordingly.

Such a waste of man-power, uhh I mean computing power, tsk tsk tsk.

Okay, I agree that the ratio of unnecessary data to the data that’s actually required by a client(subscriber) is a bit exaggerated in my explanation, but hey, it’s a possibility.

The current realtime services/APIs/frameworks/servers mostly implement the topic-based PubSub. For instance, Ably.io, PubNub, Firebase as well as Pusher all use topic-based filtering in their PubSub infrastructure. Whereas Meteor interestingly uses content-based filtering.

So, what now?

Remember the problem I mentioned in the beginning about this middle-person being swamped with data by publishers? What if there was way that prevented the data providers from sending any data that’s not required by anyone?

This is exactly what listening does! It tells a data provider to start providing- only the data required by a subscriber- only when some subscriber asks for it. Think of the drastically reduced load on the server due to this feature. Voila! Problem solved.

The linked tutorial explains the complete awesomeness of the feature, however, here’s a lil sneak peak of what is exactly happening:

listening in deepstream

deepstream introduces a concept called active data providers which listen to client’s subscription requests forwarded by the server as shown above- and you know what happens after this — just some drastic reduction in the flow of data and increased efficiency and speed of the overall realtime system.

What are your thoughts about this?

--

--

Srushtika Neelakantam

Senior Product Manager for Ably Realtime | Mozilla Tech Speaker and Rep Alumnus