Keeping Up with the Fans: Scaling for Big Events at Whatnot, with Elixir and Phoenix

Simon Zelazny
Whatnot Engineering
8 min readApr 5, 2022

The challenge:

At Whatnot Engineering, we love a challenge. And that’s exactly what we got when our very own John Walters broke the news that Logan Paul (23 million Instagram and Youtube followers) and Ninja (13 million Instagram followers) were going to GO LIVE ON WHATNOT during the holiday season.

This was a challenge indeed, as the expected number of participants in these livestreams was two orders of magnitude higher than our median viewer count. Given we had limited time, we quickly put together a plan to ensure we could handle the increased activity these big events would bring to our platform.

The plan was as follows: 1) figure out how big of a livestream our system can support, and 2) do our best to overcome any limits that we encounter.

An Aside: A general outline of our system architecture

Before we dive into the nitty-gritty — a quick tour of the Whatnot backend stack.

The Whatnot application is powered largely by two services. The bigger one is a monolithic Python/flask system, exposed via GraphQL API, which handles our “slowly-moving” data, such as user profiles, inventories, and payments. Internally, we call this service the Main Backend.

Main Backend’s little brother is the Live Service, an Elixir application written with a simple goal in mind: providing fast, reliable, and scalable auctions and chats. It handles our fast-moving data, relying heavily on Phoenix Channels, Pubsub, and Horde to communicate and maintain cluster state. Livestreams are modeled as Pubsub topics, auctions are modeled as GenServer processes, and Giveaways are backed by Redis.

Both the Main Backend and Live Service rely on Kubernetes to dynamically resize their clusters in order to handle a high absolute volume of traffic. However, the problem before us was on a different axis: while the absolute volume of traffic was sure to go up, most of the increase would fall on one particular livestream, one particular chat topic, and one product giveaway at a time.

This was a situation our system hadn’t encountered in the past, so the best way to find out how it would fare was to bring about that encounter. Enter Load Testing.

How we test

As you recall, our engineering plan was twofold: firstly, we needed to create as large a livestream as our system could manage, and secondly, try to optimize the system to ‘unlock’ larger livestream capacity.

We set about putting together a dead-simple load testing tool: a client which would connect to the Live Service websocket, start a livestream, and then repeatedly start a giveaway auction, wait some time, select a giveaway winner, then start another giveaway. In a second concurrent loop, the client would send chat messages to its livestream channel, with a tunable probability per unit of time.

Next, we wrote the counterpart: a client which would join an existing livestream, enter a giveaway if it was ongoing, wait for it to end, and then join the next giveaway. This client would also send out chats with a tunable frequency.

Running these two clients from a local machine was the smallest possible “load test”, but it was enough to verify that the scenarios are actually exercising the relevant parts of the Live Service.

How to turn two Elixir modules into a full-fledged load-testing system? It’s quite literally provided out of the box! We simply run these clients en masse as workers under a supervisor, and presto — we have a massively parallel load test framework.

There are some principles to keep in mind, though:

  1. Make sure that your system-under-test (SUT) is the same size as your target production system. You want to be comparing Ida Reds to Ida Reds, not to some other kind of apples.
  2. Make sure that the machines you use to run the load scenarios are much more powerful than the system under test. In our case, we had four X-sized machines in the SUT, and ten X-sized machines in the loader bank, to make sure we’re actually hitting limits on the server, not on the load generators.
  3. Make sure you have metrics on every relevant interaction in the system. You will definitely be adding timing metrics on various events on your server, to see which ones shoot up when the system is under load.

After you have a load-testing flow established, just apply the algorithm:

  1. Launch a load of N users;
  2. Does the server performance meet requirements?
    If so, increase N and goto 1;
    Else:
  3. Which part of the system response time is spiking? Which internal component is responsible? Can you modify the system so that the component is less stressed?

In our case, we ran into point 3 several times, and each time we were able to alleviate the problem by load shedding, that is: programming the server to drop events when their frequency exceeds a particular pre-set threshold. Here are the issues we came across:

Problems solved by load-shedding

Tracking Presence using Phoenix Tracker

Problem: Presence tracking becomes very slow after X thousand tracked processes in one topic. Process queues start growing on tracker processes.

Analysis: The single CRDT holding the given topic is simply too big. Can we “cheat” somehow? Yes! Since in any moderately large livestream > 50 users, our clients start ignoring joined/left events, we are free to stop tracking this on the server level as well.

Solution: when a client joins a channel, we check whether the Tracker list is small enough for us to care about presence tracking. If it is, we register with the tracker. If not, we continue without.

limit = get_presence_cap()
list = Presence.capped_list(topic)
case Enum.count(Map.get(list, topic, %{metas: []}).metas) == limit do
true ->
:noop
false ->
Presence.track(socket, topic, %{ ... })
end

Diff list

Problem: the ‘metas’ diffs are very large structures that are needlessly shipped over the network every time someone joins or leaves the chat. But in very large livestreams we don’t display joined/left messages at all, so we can simply cut out the diff list on the server-side as well:

def handle_out("presence_diff", diff, %{topic: topic} = socket) do
max_channel_size_for_diffs = get_max_channel_size_for_diffs()
if socket.assigns.channel_size <= max_channel_size_for_diffs do
push(socket, "presence_diff", diff)
end

This leads to a huge reduction in network traffic at the cost of CPU time spent decoding the ‘intercepted’ presence_diff event.

Chat rate-limiting

Scaling 1:1 chats in an Elixir-based system is not difficult, and the BEAM has shown its worth supporting 1:1 chats and small chat rooms at Whatsapp, LoL, Grindr, and others. At Whatnot, most livestreams remain in the ‘small chat room’ category. However, when our chat participant count balloons into the multiple-tens-of-thousands, we come across a harsh reality: it takes time to send out messages on the network — and the numbers grow surprisingly quickly when the factors involved are large.

In a small chat room of, say, 50 people, broadcasting 10 messages per second results in 500 messages sent out “on the wire”, out of the server. Easy peasy.

Now, when the chat room is populated by 50 thousand people, 10 messages per second now equals half a million chat messages going out, contending for system resources with critical auction events such as bids and purchases.

Here, we applied the cutoff pattern as well: we can limit the rate of chats-per-second-per-channel, so that chat traffic does not overwhelm the business logic.

Last Resort: Limiting user sockets in the system

As we completed our testing, we approached a clearer view of what the limits of our system are. This included the absolute limit of users we can host on our system and still provide stable performance. As a measure of last resort, we put in place one more load-shedding mechanism: an absolute limit on the number of sockets a given node will allow. The business case was easy to make: it’s better to provide N users with a good experience and refuse service for M late stragglers, than crash and burn and deprive all N+M users of any kind of benefit.

Problems solved by sharding

We also discovered one issue where the contention over internal resources was not there due to system design, as it happened to be with topic-based interactions such as chat messages, but was simply a matter of us not reaching for the configuration options provided by the library which we were using.

  1. HTTPoison and pooling.

Aside from finding the limits of our Websocket-based auction system, our load tests also hit our HTTP API, simulating client traffic. Live Service also uses HTTP to talk to our main API server, for example to fetch a user’s profile information when they log in.

During our testing, we encountered a very interesting failure mode: when the HTTP API cluster was scaled down, overwhelming it with client traffic would cause AWS Dynamo DB requests going out from Live Service to time out.

Spooky action at a distance? Not quite. The problem lies in the way that our HTTP client, HTTPoison (or more precisely Hackney, the underlying Erlang library) pools connections.

By default, Hackney uses a single pool of HTTP ‘client’ workers. When an HTTP request is made, a worker from this pool is checked out, and the HTTP call is performed using the worker’s TCP connection.

It’s not hard to see that if our overloaded API server is holding up all workers in Hackney’s worker pool, our system has no way to make HTTP calls to DynamoDB.

The solution to this conundrum is of course to not use hackney’s default pool, but create separate, named pools for each external service we talk to, to remove the possibility of interference.

Summary

We had a lot of fun rising up to meet the technical and business challenges represented by this event. As we huddled in stand-by in our company Slack channel, both the Logan Paul livestream and the Ninja livestream proceeded without major issues, and provided much fun to both the hosts and the viewers. As the proverbial engineers standing below a bridge of their own design, we were elated to have our efforts pay off and result in a great experience for our users.

We also came away from the whole project with some broader engineering takeaways to guide us in future work:

  • Technical challenges are a great motivator to step up engineering game and revisit the architecture of your systems;
  • Load-shedding, when permissible from a business perspective, is very easy to implement and will go a great way towards maintaining system performance;
  • Load testing is incredibly important when you’re using off-the-shelf software, as your particular load pattern might not have been necessarily thought of by the software authors

If you are interested in solving problems like these and much more, please reach out, we are hiring!

Simon Zelazny is a Software Engineer at Whatnot

--

--