Fundamentals of Chat System

Aman Saxena
7 min readFeb 28, 2024

--

In the realm of digital communication, chat systems play a pivotal role. This article explores the intricate world of chat system designing, delving into the architecture, protocols, and features that shape real-time interactions. Unveiling the essence of seamless communication, we unravel the key elements contributing to creating effective chat platforms.

Scenario

USER A← — — — — — — ->Chat System← — — — — — — ->USER B

  1. User A and User B create a communication channel with the chat server.
  2. User A sends a message to the chat server.
  3. When a message is received, the chat server acknowledges back to user A.
  4. The chat server sends the message to user B and stores the message in the database if the receiver’s status is offline.
  5. User B sends an acknowledgment to the chat server.
  6. The chat server notifies user A that the message has been successfully delivered.
  7. When user B reads the message, the application notifies the chat server.
  8. The chat server notifies user A that user B has read the message.

Topics

  1. Basics
  2. Real-Time Communication Protocol
  3. High-Level Design and Detail
  4. 1-on-1 Chat System Design (next article)
  5. Group Chat System Design (next article)
  6. Online Presence (next article)
  7. Security (next article)

Basics

In a chat system, We have clients(sender and receiver) communicating with each other but cannot do it directly. So, it needs a service to handle the communication and for that, we have a Chat Service.

The chat service must support the following functions:

  • Receive messages from other clients
  • Find the right recipients for each message and relay the message to the recipients.
  • If a recipient is not online, hold the messages for that recipient on the server until the recipient is online.
Picture from System Design Interview by Alex Xu

Real-Time Communication Protocol

When a client initiates a chat session, the establishment of a connection with the chat service is pivotal, and the selection of appropriate network protocols becomes a critical consideration. The efficacy of the chosen network protocols significantly influences the seamless and secure transmission of data within the chat service. The careful evaluation and selection of these protocols are imperative to ensure optimal performance, reliability, and adherence to industry standards, ultimately contributing to an enhanced user experience in the realm of real-time communication.

Few Real-Time Communication Techniques: -

  1. Polling: It is a technique in which the client periodically asks the server if there are messages available. Depending on polling frequency, polling could be costly. It could consume precious server resources to answer a question that offers no as an answer most of the time.
  2. Long Polling: Long polling optimizes HTTP request/response polling by avoiding the inefficiency of repeated connections. Instead of the traditional approach involving establishing new connections, parsing headers, querying for data, generating and delivering responses, and closing connections, long polling keeps a client connection open. The server holds the connection until new data is available or a timeout occurs, eliminating the need for continuous connection establishment and reducing resource wastage. This approach enhances efficiency in real-time communication by minimizing unnecessary server-client interactions and improving responsiveness in delivering updates.

Pros and Cons of Long Polling

3. WebSockets: WebSocket, initiated by the client, is a bi-directional, persistent connection evolving from HTTP. It seamlessly upgrades through a defined handshake. Operating on ports 80 or 443, it facilitates firewall traversal, enabling bidirectional communication, and making it versatile for both sender and receiver interactions.

Pros and Cons of WebSockets

Why WebSockets Over HTTP Protocols(Polling)

WebSockets surpass traditional HTTP protocols in chat systems due to their persistent, bidirectional connections, eliminating the need for repeated request-response cycles. This ensures real-time, low-latency communication, enhancing user experience. Unlike HTTP, WebSockets support full-duplex messaging, enabling simultaneous data exchange between clients and servers. The absence of frequent headers in WebSockets reduces overhead, making them more efficient for continuous, interactive chat applications. Additionally, WebSockets effortlessly traverse firewalls, providing a seamless and secure communication channel for dynamic chat environments.

Alternatives Of WebSockets

There are several alternatives to WebSockets for real-time communication in web applications. The choice of alternative depends on your specific requirements and use case. Here are some alternatives:

  1. WebSockets: Bidirectional, low-latency communication; widely supported in modern browsers.
  2. Server-Sent Events (SSE): Unidirectional from server to client, using standard HTTP; simpler than WebSockets.
  3. WebRTC (Web Real-Time Communication): Enables real-time peer-to-peer communication for audio, video, and data.
  4. MQTT (Message Queuing Telemetry Transport): Lightweight protocol, ideal for IoT and unreliable networks.
  5. SignalR: Library for ASP.NET, abstracts underlying transport (supports WebSockets, falls back when needed).
  6. Socket.IO: JavaScript library supporting real-time, bidirectional, and event-based communication with fallback mechanisms.

High-Level Design

We usually divide the Chat system Design into 4 parts: -

  1. Stateless Services
  2. Stateful Services
  3. Third-Party integration
  4. Storage

Stateless Services

These are traditional request/response facing services, used to manage the login, sign up, profile, etc.

Stateless services, whether monolithic or microservices are positioned behind a load balancer. The load balancer’s key function is to intelligently direct incoming requests to the appropriate services based on their request paths. Rather than creating all these stateless services from scratch, we can leverage existing services available in the market, making integration seamless.

One crucial service we’ll delve into is service discovery. Service discovery plays a pivotal role in providing clients with a curated list of DNS hostnames for chat servers. This list enables clients to establish connections with the most suitable servers, enhancing the overall efficiency and performance of the system.

Stateful Services

Within our system, the singular stateful service is the chat service. Its statefulness stems from the fact that each client establishes a continuous and persistent network connection with a designated chat server. Typically, a client doesn’t shift to another chat server as long as the current server remains accessible.

In this service, close coordination between service discovery and the chat service is essential. The aim is to prevent server overloading and ensure optimal performance. In our upcoming detailed exploration, we will delve into the intricacies of this collaboration and how it contributes to the seamless functioning of our chat service.

Third-Party Integration

The most important third-party service in the chat system is the notification system, as It is responsible for informing the user when there is a new message, even when the app is not running.

Now we talk about Scalability and storage

Storage

We need to think of which type of database to use: relational databases or NoSQL. We’ll have two types of data in a typical chat system.

The first is generic data, such as user profiles, settings, and user friends lists. This data should be stored in a reliable relational database. We surely need to implement replication and sharding to satisfy availability and scalability requirements.

The second is unique to chat systems: chat history data. It is important to understand the read/write pattern.

  1. The amount of data is enormous for chat systems (Facebook has 60 billion messages a day).
  2. Only recent chats are accessed frequently. Users do not usually look up old chats.
  3. Although very recent chat history is viewed in most cases, users might use features that require random access to data, such as search, view your mentions, jump to specific messages, etc. These cases should be supported by the data access layer.
  4. The read-to-write ratio is about 1:1 for 1 on 1 chat apps.

So I think the key-value store will be a great fit here because:

  1. Key-value stores allow easy horizontal scaling.
  2. Key-value stores provide very low latency to access data.
  3. Relational databases do not handle the long tail of data well. When the indexes grow large, random access is expensive.
  4. Key-value stores are adopted by other proven reliable chat applications. For example, both Facebook Messenger and Discord use key-value stores.

Putting All things together

Detailed Design

We divide the Chat system design into 3 parts

  1. Websocket Server
  2. Service Discovery
  3. Message Flows

Websocket Server

To handle the immense volume of devices, relying on just one WebSocket server is impractical. We need a scalable solution, and that involves deploying multiple servers. Each of these servers has the crucial task of allocating a port to every online user. To streamline this process, we introduce a WebSocket manager, which operates atop a cluster of the data store, in our case, Redis.

This WebSocket manager, sitting atop the Redis cluster, ensures efficient handling of WebSocket connections, allowing us to effectively manage the communication needs of billions of devices. It’s a robust and scalable approach to meet the demands of our dynamic user base.

taken from Mariia Romaniuk article.

We now understand the fundamentals of the Chat System and next, we design the Chat System flow.
The next article will include

  1. service discovery
  2. message flow
    1. 1-on-1 Chat System Design
    2. Group Chat System Design
    3. Synchronization of Chat
  3. Online Presence of User
  4. Security

Resources Referred: -

System Design Interview — An Insider’s Guide, Alex Xu

Amazing Article by Mariia Romaniuk

--

--

Aman Saxena

Code wizard weaving wonders in Go, Node.js spells. MongoDB, Elasticsearch tamer, open source enthusiast, and system architect extraordinaire! 🚀 #SeniorDev