Designing Bulk Notification System — Part 1

Aryan01
2 min readMar 15, 2024

--

Basic Design

Let's start with the most basic system design for notification service.

Notification Microservice

  1. Get all the information it needs to send a notification to a user.
    To trigger these we can use other Microservices, cron jobs, or scripts to trigger the Notification service.
  2. Notification service, depending upon the type of notification (here we have taken email, SMS, and push notification) sends to the respective microservice.
  3. The notification service will need information about the user it needs to send information to, for that we need User Information Mysql.

Problems

  1. A single point of failure is if the Notification microservice for some reason gets down, every single request the other upstream microservice (Email, SMS, and Push) is making will get lost.
  2. Server overload to Notification Microservice, if cron jobs run every day and call all users it might overload it.
  3. Notification lost — No means of tracking and reflowing notifications that were sent from other services whether they reached correctly or not.
  4. Difficult to scale.

Scaling

1. Message Queues

  1. As the notification service gets requests it adds those to the notification queue thus the Notification service does not have to deal with handling the complexity of calling (Email, SMS, Push) microservices.
  2. Now all the Notification service does is hold the request, poll the information required from DB, and add that message to the Message Queue.

2. Workers

  1. Lightweight functions, Lambda functions, and processes (that have only one job) are easy to scale.
  2. Consume from the Queue and send to different microservices.
  3. As lightweight we can scale depending on messages, irrespective of scaling the entire microservice when we have spike in traffic.

3. Tracking Notifications

  1. When workers send what if the notification fails or error outs for some reason we might still lose these notifications.
  2. When the worker is done processing, it can write back to the Mysql Table (green line: success or delivered) or (red line: failure)

4. Relay Service

  1. The failed messages which are now in MysqlDB can be sent back again to the message queue for reprocessing.
  2. It pulls notifications that have the status Failed and it reads them re-enqueuing them in the message queue.
  3. Also, we can keep a retry count, as to how many times a particular notification can be reenqued again and again.

In the Part-2 we will see how this design also becomes a bottleneck at a scale where million of notifications are sent and what further optimization we can do to avoid that.

--

--