Chatting 101: Hello, Goodbye, and Everything in Between

Aman Saxena
11 min readMar 14, 2024

--

In the Previous Article, We understood the fundamentals of the Chat System. In this section, we focus on the flow of the system.

TOPIC

  1. Service Discovery
  2. Message Flow
    1. 1-on-1 Chat System
    2. Group Chat System
  3. Online Presence

SERVICE DISCOVERY

Service discovery in a chat system is crucial for efficiently managing and connecting various components within the system. Here’s a brief description of how service discovery works in the context of a chat system.

Service Discovery in Chat System:

  1. Dynamic Component Registration: As different components of the chat system (such as servers, databases, and message queues) come online or scale horizontally, they register themselves dynamically with a service registry.
  2. Service Registry: A centralized service registry maintains a real-time list of all available components and their locations. This registry acts as a directory for the chat system, allowing components to discover each other.
  3. Load Balancing: Utilize load balancers that interact with the service registry to distribute incoming connections among multiple instances of chat servers. This ensures optimal resource utilization and scalability. Failover and
  4. Redundancy: Service discovery facilitates automatic failover by redirecting traffic to healthy instances if a component fails. This enhances system reliability and minimizes downtime.
  5. Efficient Routing: With service discovery, components can efficiently locate and communicate with each other, optimizing message routing and reducing chat system latency.
  6. Dynamic Updates: Service discovery allows real-time updates as components join or leave the system. This dynamic nature ensures that the chat system seamlessly adapts to changes in its infrastructure.
  7. Consistent Configuration: By centralizing configuration information in the service registry, all components in the chat system can access the same configuration data. This promotes consistency across the system.
  8. Integration with Orchestration Tools: Service discovery often integrates with container orchestration tools like Kubernetes, enabling automatic registration and discovery of services within containerized environments.

In summary, service discovery plays a pivotal role in maintaining the agility and reliability of a chat system. It enables dynamic and efficient communication between components, ensuring seamless scaling, fault tolerance, and overall robustness of the entire system.

taken from Alex Xu's book

MESSAGE FLOW

In a chat system, the message flow involves the exchange of messages between users through the underlying architecture. Here’s a high-level description of the typical message flow in a chat system:

  1. User Initialization: Users initiate the chat application, log in, and establish a connection with the chat server using a WebSocket or a similar real-time communication protocol.
  2. Presence Management: The server updates the online/offline status of users in real time. Presence information is broadcast to connected users, allowing them to see the availability of their contacts.
  3. Message Composition: Users compose messages within the chat application, specifying the recipient and the content of the message.
  4. Client-to-Server Communication: When a user sends a message, the chat client communicates with the chat server over the established WebSocket connection. The message is transmitted to the server for processing.
  5. Server Processing: The server receives the message, validates it, and determines the intended recipient. It may also perform security checks and log the message for future reference.
  6. Routing to Recipient: The server routes the message to the WebSocket connection associated with the recipient. If the recipient is offline, the server may store the message for later delivery or employ push notifications to alert the user.
  7. Real-time Update: The recipient’s chat client receives the incoming message in real-time. The user interface is updated to display the new message, and, if necessary, notifications are triggered to alert the user.
  8. Acknowledgment (Optional): The recipient’s client may send an acknowledgment back to the server to confirm successful message reception. This step ensures message delivery and aids in managing message reliability.
  9. Message Storage (Optional): The server may store messages in a database for chat history and retrieval purposes. This step ensures that users can access their chat history even after logging out or restarting the application.
  10. Continuous Communication: The WebSocket connection remains open for continuous communication, allowing users to exchange messages seamlessly.

This message flow represents a basic overview and can be adapted based on specific system requirements, including security measures, additional features, and scalability considerations.

1-ON-1 CHAT & GROUP CHAT SYSTEM DESIGN

Designing a chat system involves several components to ensure real-time communication, scalability, and reliability. Below is a simplified architecture for a basic chat system:

  1. Database: Store user data, including profiles and chat history. Choose a database that supports fast read and write operations, like MongoDB or PostgreSQL.
  2. WebSocket for Real-time Communication: Implement WebSocket for bidirectional, real-time communication between clients and the server. Use libraries like Socket.IO for handling WebSocket connections efficiently.
  3. Message Queues: Implement a message queue system (e.g., RabbitMQ or Kafka) for handling message delivery and ensuring reliability.
    To handle millions of messages we can use Kafka to reduce the direct load on the database and server, and with the retry mechanism, we can handle failed messages
  4. Server: Implement a server to handle WebSocket connections, manage online/offline status, and route messages between users. Use a load balancer to distribute incoming WebSocket connections across multiple servers for scalability.
  5. REST API: Create a RESTful API for additional functionalities like user search, profile updates, etc.
  6. Security: Implement end-to-end encryption for message security. Use HTTPS for secure data transfer. Validate and sanitize user inputs to prevent security vulnerabilities.
  7. Push Notifications: Integrate push notification services (e.g., Firebase Cloud Messaging or Apple Push Notification Service) to notify users of new messages when the app is in the background.
  8. Client-Side Applications: Develop mobile and web applications for users to send and receive messages. Use frameworks like React, Angular, or Vue for web apps, and Swift/Java/Kotlin for mobile apps.
1-on-1 chat design by Alex Xu

FLOW OF 1-ON-1 CHAT SYSTEM

  1. User A sends a chat message to Chat Server 1.
  2. Chat server 1 obtains a message ID from the ID generator.
  3. Chat server 1 sends the message to the message sync queue.
  4. The message is stored in a key-value store.
  5. a. If User B is online, the message is forwarded to Chat Server 2 where User B is connected.
    b. If User B is offline, a push notification is sent from push notification (PN) servers.
  6. Chat server 2 forwards the message to User B. There is a persistent WebSocket connection between User B and Chat Server 2.
Group Chat design by Alex Xu

FLOW OF GROUP CHAT SYSTEM

  1. User A composes a message in the group chat and initiates the sending process.
  2. The chat server receives the message from User A.
  3. The server creates copies of the message for each member of the group (User B and User C).
  4. The copies are placed in individual message sync queues associated with each recipient. User B’s chat client, connected to the server via a WebSocket or similar protocol, receives a real-time update about the new message in the group chat. Similarly, User C’s chat client receives a real-time update about the new message.
  5. The chat interface for User B is dynamically updated to display the incoming message from User A. Concurrently, the chat interface for User C is updated to show the same message from User A.
  6. The chat clients for User B and User C may send acknowledgment messages back to the server to confirm the successful reception of the message.
  7. The server may store the message in the group chat’s message history for future retrieval(optional).
basic database schema

DATABASE SCHEMA

Message table for 1-ON-1 chat — The primary key is message_id, which helps to decide the message sequence. We cannot rely on created_at to decide the message sequence because two messages can be created at the same time.

Message table for Group Chat — The composite primary key is (channel_id, message_id). A primary key is a unique identifier for each row in a database table. A composite primary key consists of two or more columns that together uniquely identify a row. In this case, the composite primary key is composed of two columns: channel_id and message_id. Channel and group represent the same meaning here. channel_id is the partition key because all queries in a group chat operate in a channel.

Message ID — The message_id is responsible for the order of messages. IDs must be unique. IDs should be sortable by time, meaning new rows have higher IDs than old ones. How can we achieve those two guarantees? The first idea that comes to mind is the “auto_increment” keyword in MySql. However, NoSQL databases usually do not provide such a feature. The second approach is to use a global 64-bit sequence number generator like Snowflake. The final approach is to use a local sequence number generator. Local means IDs are unique within a group. The reason why local IDs work is that maintaining message sequences within a one-on-one channel or a group channel is sufficient. This approach is easier to implement in comparison to the global ID implementation.

Kafka functioning

QUEUE SYSTEM

A queue system plays a pivotal role in ensuring reliable and asynchronous message processing. When a user sends a message, it’s often beneficial to decouple the sending and receiving processes.

  1. Asynchronous Message Processing: A queue system in a chat system allows for asynchronous processing of messages. When a user sends a message, it is placed in a message queue, enabling the system to handle the message processing independently of the main application flow. This improves responsiveness and overall system efficiency.
  2. Enhanced Scalability: Queue systems contribute to improved scalability in a chat system. By decoupling message processing from the main application logic, the system can efficiently scale horizontally. Additional instances of message processing components can be added to the system to handle increasing message volumes, ensuring responsiveness during peak usage.
  3. Reliable Message Delivery: Message queues ensure reliable message delivery. If a recipient is temporarily unavailable or offline, the message remains in the queue until the recipient becomes available. This guarantees that messages are not lost and are delivered when the user is back online.
  4. Load Balancing: Queue systems enable effective load balancing in a chat system. Incoming messages are distributed across multiple message processing components, preventing bottlenecks and ensuring that the system can handle concurrent message processing efficiently. Load balancing contributes to optimal resource utilization and system performance.
  5. Fault Tolerance and Redundancy: Message queues enhance fault tolerance by providing a mechanism for redundant processing. If a message processing component fails, another instance can take over, ensuring continuous message processing and minimizing disruptions in the chat system. This redundancy improves the overall reliability of the messaging infrastructure.

REST API

1. Send a message: This API is used to send a text message from a sender to a receiver by making a POST API call to the /messages API endpoint. Generally, the sender’s and receiver’s IDs are their phone numbers.

sendMessage(sender_ID, reciever_ID, type, text=none, media_object=none, document=none)

sender_ID: unique identifier of the user who sends the message.
reciever_ID: the unique identifier of the user who receives the message.
type: represents whether the sender sends a media file or a document (the default message type is text).
text: contains the text that has to be sent as a message.
media_object: is defined based on the type parameter. It represents the media file to be sent.
document: the document file to be sent.

2. Get a message: Using this API call, users can fetch all unread messages when they come online after being offline.

getMessage(user_Id)

user_id: unique identifier representing the user who has to fetch all unread messages.

3. Upload media or document file: We can upload media files via the uploadFile() API by making a POST request to the /v1/media API endpoint. A successful response will return an ID that’s forwarded to the receiver. Let’s say that the maximum file size for media that can be uploaded is 16 MB, while the limit is 100 MB for a document.

uploadFile(file_type, file)

file_type: type of file uploaded via the API call.
file: contains the file being uploaded via the API call.

4. Download a document or media file

downloadFile(user_id, file_id)

user_id: unique identifier of the user who will download a file.
file_id: unique identifier of a file. It’s generated while uploading a file via the uploadFile() API call. The downloadFile() API call downloads the media file through this identifier. The client can find the file_id by providing the file name to the server.

THE FINAL CHAT SYSTEM WITH INTEGRATING BOTH SYSTEM FLOW

image taken from Mariia Romaniuk’s article

ONLINE PRESENCE

In a chat system, online presence enhances real-time communication. User presence is tracked to indicate whether a user is actively engaged (online) or offline. This status is maintained through a WebSocket connection, allowing instant updates when users join or leave the chat. The system employs a server-side mechanism to monitor users’ connectivity, updating their status dynamically. This information is crucial for users to identify when their counterparts are available for communication, fostering timely interactions. Additionally, it influences features like push notifications, ensuring that users are promptly informed of new messages even when they are not actively using the chat application. Overall, incorporating online presence functionality enriches the user experience by providing insight into the availability of peers in the chat system.

USER LOGIN

Upon establishing a WebSocket connection, user A’s online status and last_active_at timestamp are stored in the Key-Value (KV) store. This information serves as a reference for user A’s presence. The presence indicator reflects user A as “online” upon login, leveraging the stored data. This approach enables real-time services to accurately display the user’s current status and last activity timestamp, facilitating an immediate and accurate representation of user A’s online presence within the system.

USER LOGOUT

During the user logout flow, the online status in the Key-Value (KV) store is updated to “offline.” This ensures accurate representation in the presence indicator, signaling that the user is no longer online. The logout process effectively reflects the user’s real-time status change and updates the system accordingly.

USER DISCONNECTION

To address the challenge of frequent internet disconnections, a heartbeat mechanism is introduced in the design. Periodic heartbeat events sent from the online client to presence servers occur at defined intervals (e.g., every x seconds). If a server receives a heartbeat within the designated timeframe, the user is considered online; otherwise, they are marked as offline. This intelligent approach prevents rapid status fluctuations during brief network interruptions, ensuring a more stable and accurate representation of a user’s online presence. The heartbeat mechanism enhances the system’s resilience and provides a smoother user experience, mitigating the impact of transient network issues.

ONLINE STATUS FANOUT

In the presence server’s publish-subscribe model, each friend pair (e.g., A-B, A-C, A-D) maintains a dedicated channel. When User A’s online status changes, the event is published to relevant channels (A-B, A-C, A-D). Subscribed friends (User B, C, D) receive real-time updates through WebSocket communication, enabling seamless and efficient online status notifications within the network.

CONCLUSION

A well-designed chat system balances real-time communication, scalability, and reliability. Components like WebSocket, message queues, and group management enhance user experience. Security measures, presence indicators, and a thoughtful logout flow contribute to a robust system, ensuring effective and enjoyable online interactions.

I’ve excluded synchronization of messages, synchronization of messages in different systems, and roles of the user like admin in group chat.

Resources Referred: -

System Design Interview — An Insider’s Guide, Alex Xu

Amazing Article by Mariia Romaniuk

--

--

Aman Saxena

Code wizard weaving wonders in Go, Node.js spells. MongoDB, Elasticsearch tamer, open source enthusiast, and system architect extraordinaire! 🚀 #SeniorDev