Building a Notification Framework for Microservice-based Application

With Vatsalya SN

Sohom Majumdar
Walmart Global Tech Blog
8 min readJun 8, 2021

--

Photo by Liam Truong on Unsplash

Microservices are a popular design pattern where a large application is broken down into multiple independent and loosely coupled services communicating with one another through predefined interfaces. Walmart’s ML Platform is built using the same principle — independent services deployed in a Kubernetes cluster communicating through REST API’s. As a platform feature, serving user-targeted notifications for events was a priority requirement. For that, we developed a model framework that can be used by any microservice-based application interested in serving notifications.

The High-Level Design

High-Level Design for serving notifications

At a high level, the system should be able to generate and handle notifications under the following rules.

  1. Each service can generate a notification targeted to a user or a set of users independently.
  2. All the notifications targeted towards a user would be stored in a Notification Store
  3. For an online user, the notification message should pop up immediately in the UI. For offline users, the notifications should be available in the notification tray once they log in.
  4. User can mark the notifications as read or delete older notifications.
  5. The system can also choose to purge older notifications if so required.

Along with the required rules, we added some desired features to the system as well:

  1. The system should not put a significant burden on the microservices.
  2. The system should be fast and fairly stable.
  3. While it is not desirable to lose notifications, it was ok to deliver them slightly delayed.

We designed the entire system as a Java-based library that can be imported into any microservice interested to send notifications. For the notification store, Redis is the choice and Server-Sent Events (SSE) is used for sending the notifications to the UI client (user’s browser). We will go through each of the systems separately in subsequent sections, and then bring them all together to see how it all adds up to complete this feature.

Backend Implementation

Modelling the notification

A very simple design of the notification structure would require two fields — the targeted user and the message. This is the structure that we went with initially, and as the feature began to mature further fields were added to enhance the interface and bundle more information in the data structure. In the end, we formalised this structure for a notification.

Storing the Notification

Storage of the notifications in a Notification Store had two major points to consider:

  • Given a notification, it should be easy to push it into the store
  • Given a user, it should be easy to get their notifications from the store.

Redis being a key-value store accomplishes both these tasks with milliseconds latencies. Additionally, Redis has been proven to be resilient to failure and highly scalable. Hence it was chosen to work as a backing store for the notifications.

Additionally, Redis has in-build support for higher ADT’s like lists and maps. We leverage maps (i.e., hashes in Redis) to store the notifications. For each user, a corresponding hash of notification id to the notification JSON is stored. Each unique user id acts as a key in Redis. This structure ensures support for 2³² users in the system, with each user having 2³² potential notifications.

Storage structure inside Redis

Redis can support highly concurrent workloads in a thread-safe way. It can also provide production-grade support via Redis Sentinels for high availability and AOF file backups for durability. There is excellent documentation (see Redis Sentinel and Redis Persistence) as to how to set them up for running a production-grade Redis cluster.

Pushing the notification to the Store

The notification library exposes the NotifierClient interface which has notifyUsers and notifyGroup methods. To trigger a notification, the microservice will invoke the notifyUsers method with a Notification object and the list of user IDs to send the notification to. The library also allows to create groups that can be used to cluster together similar users (e.g., all users of a particular project, all users who use GPU etc.) and microservices can optionally send a notification to the entire group using notifyGroup method.

Jedis is the most well-known library in Java to communicate with Redis, which is what we use in the library to read and write notifications. Jedis supports advanced features like Sentinel support and connection pooling which makes it ideal for production servers as well.

To protect the reads and writes from occasional outages, the calls to Redis is wrapped via Resilience4J, which ensures that proper retries and error handling is done in case of temporary glitches.

Frontend Implementation

To enable the frontend, we use an Express server that acts as a middleware between the user’s browser and the backend microservices to perform authentication and session management. We piggybacked on this server to read the notifications from Redis and push them to the user.

Pushing the notification to the browser

Server-Sent Events (SSE) is a technology built on top of HTTP. For every user that logs in to the system, we establish a persistent HTTP connection for the duration of the user session. SSE’s protocol specs dictate that the JSON data is converted to a string and each event ends with two newline characters. You can check this tutorial to read more about how to use SSE using Javascript.

Once the connection is established, we leverage Node.js event model to push data to the client as and when new notifications become available from Redis. We emit a notification event that is attached to an event listener within the scope of the HTTP SSE handler. We use the logged-in user’s unique id to match the message from Redis to the user’s connection.

This GitHub code snippet describes how it is accomplished.

Receiving the notification in the browser

On the client side, SSE provides an EventSource API that allows us to connect to the server and receive updates from it. SSE has a limitation that it can support six concurrent connections at a time. Since we open a new connection in each browser tab, it limited our users to six open tabs at a time. To circumvent this limitation, we use SharedWorkers. This enables us to create one persistent connection in a SharedWorker and access it across different browser tabs and iframes. One downside of shared workers is that it is not supported on Safari and IE, but since the majority of our user base was on Chrome and Firefox this was taken as an acceptable solution. In case a user is using IE or Safari, we fall back to the SSE model where only 6 tabs are permitted.

When a user logs in for the first time, a new shared worker instance is instantiated, which is then attached to the window instance. This can then be accessed in all browser contexts. Web pages can then communicate to the shared worker using a MessagePort object and attach an event handler that will be called every time the shared worker pushes a message.

Following snippet in github has the code for receiving the notifications in the browser.

Showing the notification to the user

Every time a new tab or browser window is opened, the shared worker assigns a port number to each new tab. These port numbers are kept in an array for each user. Whenever a new notification is generated, it is pushed to all the ports to make it consistent across the tabs. On closing the tab a beforeUnload event is triggered, in which we remove the corresponding port from the array.

To display the notification only when the user is active on the tab, we handle the visibilitychange event exposed by the Page Visibility API. The handler marks the page as non-hidden and then refresh the redux store with the notifications from the backend. That triggers a render for the UI and the notification is displayed in the snackbar.

Bridging the Backend and the Frontend

There are two sections of the system that works in tandem to provide the entire notification system:

  1. A set of backend services — that generates the notifications and persists them to Redis
  2. The UI is responsible for displaying the notifications to the users, either in real-time when they are online, or as a list of missed notifications when they come online.

Backend to Frontend — via Redis PubSub

For a real-time scenario, the UI needs to be intimated by a new notification as soon as it is generated. We use Redis PubSub to create a feedback channel from backend services to a UI server, which then communicates to the UI client (or user’s browser) via HTML5 SSE as described.

When a notification is generated and writing the notification to the user’s key, the library also generates a PubSub message on a specific channel with the corresponding user-id that got modified. The UI server subscribes to the given PubSub channel and builds a map of notifications in its memory on receiving the user-id. If the user is online, the UI server sends the entire notification JSON map on that user’s SSE socket to render it in his browser.

A crude sequence diagram to demonstrate the flow of an instance of notification from its generation in the backend services to being displayed to the user is given below.

Crude Sequence Diagram for the flow of Notifications

Frontend to Backend — via REST

Once the user responds to a notification (read it, click on it or delete it), that information has to flow through to the backend store. We implemented it as a REST endpoint.

The library itself exposes a Java API, which can take a unique notification id, a user id and the status to update, and it would patch the notification in Redis with the updated status. A service can be used then to wrap this API with a REST or any other similar(e.g., gRPC) endpoint.

Cleaning Up

For cleaning the expired and deleted notifications from Redis, Quartz scheduler is used inside the library. To ensure that only a single instance of the cleaner runs at a time, the Redlock algorithm is used to create a distributed locking mechanism.

Putting it all together

The entire framework is available as a library at https://github.com/daichi-m/notification4J.

--

--