Image for post
Image for post
Photo by Glen Carrie on Unsplash

System Design — AWS SNS

Amit Singh Rathore
Dec 5, 2020 · 5 min read

In this blog we will try to build a miniature version of AWS Simple Notification Service.

  1. Problem Statement
  2. Requirement Gathering
  3. Building MVP
  4. Building for Resiliency and Availability

Problem Statement

Simple notification service allows user to publish messages to topic. A user subscribe to topic(s). Whenever a message is published to the topic by publisher, subscriber receives the message published in the topic. Both publisher and consumer are unaware of each other. They do not communicate directly.

We are required to Design SNS service that clients all over the world can use to read and write messages.

Requirement gathering:

Message Order— Message order must be maintained
Grouping by topic — Message must be grouped by topic.

Message Order — Subscriber should receive message in the order they were published in the topic
Grouping by topic — Subscriber should receive only the subscribed topic’s message

For retry purpose message should be retained for 21 days in case it has not been consumed. (In AWS SNS keeps trying for ~20 days if the consumer [lambda or sqs] is not available.)

Application should be able to sustain uneven traffic distribution.
Peak to be 50M messages per day.
Avg Message size is 4kB.
It should be able to process considerably high volume of message
Solution should be available across multiple location (Mumbai and Singapore) and application should be able to process message in case of one DC’s failure.

Building An MVP

Image for post
Image for post

The above diagram shows a basic flow of the system. Publisher publishes message(s) to topic(s) and consumer(s) read/process them.

Actors:

  1. Publisher — One who publishes the message.
  2. Consumer — One who reads the messages from topic (CONSUMER_ID)

APIs:

  1. createTopic(topic_name): →TOPIC_ID
  2. createSubscription(topic_id, consumer_id): →SUBSCRIPTION_ID
  3. publish(TOPIC_ID, Message): → Sends back Ack on successful write
  4. generateMessageId(TOPIC_ID) → MESSAGE_ID
  5. distribute(TOPIC_ID, CONSUMER_ID)
  6. remove(TOPIC_ID, MESSAGE_ID): → Send Ack on successful removal
Image for post
Image for post

All apis are self explanatory. Message Position Marker Service, this service is helps in deleting the correct message id from the topic. Instead of deleting the last message from the topic, it ensures the it only delete the message id which was last read. This ensures the ordering of the message.

Storage Abstraction:

Image for post
Image for post

Adding resiliency and Availability

Not all API/Service’s usage pattern will be same. So we need to be smart with the no of instance of the service will be needed. Below diagram tries to present an indicative service replication. As is the general use case one message can be consumed by multiple application hence we should be running more number of distribute service. Similarly message id service is used both while writing and reading the message, so this should have higher replication than publish and distribution service.

Image for post
Image for post

In above diagram we created multiple instances of the services in a single location, we will now replicate the whole setup in different location. This will add resiliency as well as reduce the latency. Publisher in Mumbai will hit the DC nearest to it and publish the message. User in Singapore will hit the nearest DC to read the message. This requires synchronization between two DC. We can have distributed services with a consensus framework (raft, paxos) to have common state across DCs.

Image for post
Image for post

Data replication between DCs needs to be monitored closely. Inter region latency of write will be high and that will impact overall time between write initiation and acknowledgement of successful write. Here based on latency requirement we can choose asynchronous, synchronous or hybrid replication.

In case of asynchronous replication as soon as the write is complete in the nearest DC the ack is returned. This will give lowest latency for publish. But it will also low data durability. Before the data can be read from the remote location, it has to be replicated to that DC. Hence distribution will have high latency.

In case of synchronous replication we will have high latency while publishing and low latency on read. It will give us high data durability as data must be replicated to remote DC before a successful publish is acknowledged.

The third approach is to chose middle path where we do synchronous with say 1 remote location and remaining location will do a lazy replication. When combined with access pattern this can be very good. We chose the DC which is used heavily for consumption and select that DC as part of Synchronous ack. And remaining non frequent DCs can be replicated lazily.

Storage calculation:

peak * topic id * message id * metadata * message * ~ memory 
10M * (64 bits) * (64 bits) * (256 bytes) * 4kB * 21 ~ 5TB

Based on available hardware we can come up with no of machines needed to implement this architecture. If latency is a issue we can go for SSD or RAM intensive instances. Or If we can tolerate some latency we can go with HDD.

Network bandwidth calculation:

Since at peak we are writing 250 GB per day. We need to find the max peak write at any time in the day. For simplifying the calculation, I am assuming that peak will be 1.5 times of average. This gives us 35Mbps.

READ or consumption can also be obtained similarly. For simplicity I have assumed READ is 5 times write. Actual data can be gathered in requirement phase.

WRITE → peak * 1.5 *1000 /3600 *24 ~ 35 Mbps
READ → WRITE * 6 ~ 200 Mbps

Latency calculation:

ID generation + DC with highest latency + write to disk → publish latency
ID selection + Max(Remote DC read) + Message position marker update → read latency

Based on the above bandwidth requirement we need to select instance that has required network capability.

With this exercise we have designed a Notification system.

Reference: https://sre.google/resources/practices-and-processes/distributed-pubsub/

The Startup

Medium's largest active publication, followed by +756K people. Follow to join our community.

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store