Designing Bulk Notification System — Part 2

Aryan01
2 min readMar 15, 2024

--

After scaling out the Bulk Notification System in Part 1 we still can have challenges with the current design during Peak hours.

Challenges while Scaling

1. Read load on DB during the peak. As the database stores user information, information regarding each notification delivery status at the peak it can get throttled. Similarly, the relay service also relies on this db for enqueuing the Queue with failed notifications.

2. As the database has limited I/O connections we need to scale it horizontally to our need for peak hours which do not happen very frequently but still cannot be ignored. (So will have to pay more for these surges which happen very few times).

Re-architecure

Step 1 Prioritize the transactions

Queues

Using the queues, we can prioritize the incoming load, some notifications might be more important than others and hence can be sent after the priority ones are sent.

Rate Limiter

It ensures that one customer's events are not affecting others and these queues which we have added have configurable rate limits, and if that limit is crossed it goes into a separate queue.

Step 2 Reducing DB bottleneck

1. As the load was increasing, the workers were constantly updating the Mysql notification entries that the status of the notification was delivered/not delivered.

2. Also the relay service was constantly reading from this cluster and reflowing it to the queue for reprocessing.

KENSIS/SQS QUEUE

We can add a queue so that when workers update the notifications status that can be done asynchronously.
Now the writes that were happening to Mysql db will pass through queue and can be added to Mysql asynchronously and thus not bottlenecking the DB.

References
https://engineering.razorpay.com/how-razorpays-notification-service-handles-increasing-load-f787623a490f?gi=d8181ec1a854

--

--