Moving Slow to Move Fast: Real-time Chat Rate Limiting

Whatnot Engineering
Whatnot Engineering
7 min readFeb 27, 2024

Lauren Zhou | Community Software Engineer

Whatnot’s community continues to grow at an incredible pace. Livestreams are the center of interaction between buyers and sellers, and even after the introduction of direct messaging, the livestream chat continues to be the main place where users engage with each other. As we were scaling, sellers, moderators (aka mods), and viewers were struggling to keep up with chat in the fast-paced environment of livestream shopping.

In our largest streams, messages came in so quickly that chat was unreadable. Sellers would miss out on key messages like questions about their products, requests on what to auction off next, or exciting livestream events like raids.

To address the fast-flowing chats, we initially implemented rate-limiting which dropped chat messages if the rate of chat passed certain thresholds. These limits helped a great deal towards ensuring that livestream chat was readable, but had several drawbacks, mainly stemming from the fact that the limits were applied over a livestream chat as a whole rather than on a per-user basis, leading to issues including:

  • Fairness: It was possible for one user to try and repeatedly fail to send any chat message and unluckily get rate-limited each time while another user might get to send multiple messages.
  • Unpredictable user experience: It was impossible to predict whether a chat message would be dropped until after a user attempted to send it.
  • Perception of quality: Messages were dropped without any feedback response, and users believed that their chat messages were dropped due to system malfunctions and not by design.

To address concerns with the existing rate-limiting, we introduced Slow Mode, a per-user cooldown between chat messages.

Experience

Slow Mode creates a clearer user experience over dropped messages and gives sellers control over the pace of their chat. When a seller turns on slow mode, they choose the number of seconds that buyers must wait between sending messages.

Architecture

Our livestream chat functions through Live Service, our Elixir backend. We use the Phoenix framework, which at a high level processes messages sequentially while maintaining a continuous state.

When a user joins a livestream, they connect to a Phoenix channel for chat (the chat channel). The user’s socket fetches data from Main Backend, our second backend service, as well as DynamoDB and creates an initial state stored in the “socket assigns”, a place for app-specific data maintained through the lifecycle of the connection. The socket assigns is a lightweight place to store and fetch data and may be updated as the socket receives and processes messages.

Imagining a socket assigns with the can_send_messages field, it looks something like this when a user sends a chat message.

For Slow Mode, we store the Slow Mode seconds on the livestream object in DynamoDB. On join, we fetch the livestream information and store the Slow Mode seconds in the socket assigns. This allows us to avoid checking DynamoDB every time a user sends a chat message, both improving the latency of sending a chat message and saving read capacity on DynamoDB.

Seller Settings

What about when a seller updates their Slow Mode settings? If we only fetched the Slow Mode seconds on-join, users who joined the livestream when Slow Mode was off would be able to bypass Slow Mode. Additionally, the client needs to know when Slow Mode changes so that it can render the appropriate UI.

To address this:

  1. Seller client sends message to set the slow mode seconds
  2. Seller socket processes message, validating seller permissions and the value of seconds set
  3. Seller socket broadcasts out an update to all connected users and updates the value in DynamoDB
  4. Connected user sockets update their socket assigns to reflect the new value

Now, buyers fetch correct info on-join and also update this info real-time. Onto the actual limiting mechanism.

Buyer Side Rate-Limiting

To limit how often buyers send messages, we need to know when they last sent a message. In an initial approach for Slow Mode, we also relied on socket assigns to manage the rate-limiting. After a user sends a message, the assigns was updated to reflect the last message timestamp. Then for future messages, we would validate that slow_mode_seconds had elapsed since the previous message.

  1. slow_mode_seconds is set to 3 seconds, so users must wait 3000ms between messages
  2. The current unix time is 1707926100 and the socket stores info that the last message was sent at 1707923000, so enough time has elapsed and the user can send a message
  3. Update the last_message_timestamp.

Solving For Reconnects

This method is simple to understand and works well so long as a user stays connected. The chat socket connections are on a per-livestream basis, so this method of storing the information in the socket assigns works well to limit the messages a given user sends in a livestream.

However, users in our app can disconnect and reconnect to the chat channel arbitrarily. We need to persist the last_message_timestamp so that users are not able to circumvent Slow Mode by refreshing the tab.

To store this short-lived information, we use Redis. With just a little work on join and on leave, a user’s slow mode information is persisted correctly.

  1. On joining a livestream chat channel, check Redis
  2. Manage the socket assigns last_message_timestamp similarly as before
  3. On disconnecting from the chat channel, store the last_message_timestamp as a key based on the user ID and livestream ID

Now, users can reconnect to channels and they will still be appropriately rate limited. Additionally, this method minimizes trips to Redis, where the necessary info (slow_mode_seconds and last_message_timestamp ) is fetched on join then maintained entirely in the socket assigns otherwise.

This solves for reconnects, but a user may also open many tabs of the same livestream or join from multiple devices.

In this case, all the connections gain updated information on join, but they don’t synchronize between each other when a message is sent. We could address this similarly to the Slow Mode updates by also making a user’s multiple connections pass messages between each other, but the edge cases kept adding up. For example:

Connections 1 and 2 synchronize between each other by intercepting the new_msg broadcast to update their assigns. But new connections (connection 3) will fetch stale data on join.

The main problem was that we were storing last_message_ms in the socket assigns, and only persisting it to Redis in certain cases. Instead, we can use Redis as a source of truth at all times, using a combination of the user ID and livestream ID as the key to associate the stored values per-user per-livestream.

This approach not only solves both the leave/rejoin issue and the multi-connection synchronization, it is also much simpler than the socket assigns-based solution.

Finishing Touches

The last piece needed for slow mode was to offer an informative UI, as dropping messages without any feedback was one of the key issues we set out to address.

Clients needed to render a timer above the chat input which told them how much time was left before they could chat again. Clients can manage this state locally while a user remains connected to a livestream, but we needed to add support for persisting the timing when a user closed and rejoined the stream.

To do this, the server additionally pushes an initial info event and payload after a user connects to the chat channel. This lets clients know the currently set Slow Mode seconds as well as the time until the user can send their next message.

To accurately render the “Slow Mode is enabled” text and update it real time, clients follow a similar pattern as we used in Live Service: they get an initial source of truth then update it when receiving a broadcast that Slow Mode has been updated.

What’s Next

Livestream shopping incorporates a number of events happening all at once — users can be bidding, tipping, chatting, answering polls, you name it. While we’ve given sellers and mods more tools to communicate with each other, their buyers, and control their livestream chat, there’s still more to be done to improve the livestream experience and a host of design and technical challenges that come with it.

If that sounds like something that interests you, come join us!

--

--