Feeds with Real-time Signals (Home Feed — Part 2)
Building a Real-time Discovery stack at Whatnot.
Whatnot is a marketplace with fast-changing dynamics and highly engaged users in livestreams. Just like other home feeds, we want to recommend relevant livestreams to our users. What makes our discovery problem unique is that livestreams are ephemeral content — We can’t recommend yesterday’s livestreams to today’s users and we need fresh signals. Also like surfing, our livestreams often have “lulls’’ between “waves’’, we want to promote livestreams to our users when the most exciting things are happening at that moment, especially to our new users so that they can experience the best livestream shopping experience in their first few livestreams.
This blog post is a sequel to building Home Feed using GraphQL and Infinite Scroll at Whatnot.
Many technical decisions boil down to a trilemma among speed, recall, and relevance. In building our real-time discovery stack, we put a high priority on speed:
- The speed to ingest events: From event generation to signals ready to be retrieved across our stack is under 3 seconds which is just as fast as a Tesla Model S Plaid from 0 to 60 mph;
- The speed to generate home feed: We target spending less than 50 milliseconds to retrieve a personalized candidates to give our user the best experience while using the app;
- The speed to iterate and experiment: We want to quickly build features, experiment on different models and UI placements.
Make it Real(time)
First and foremost, we need to know what is happening in the livestreams — livestream status changed, new auctions started, engaged chats and giveaways in the show, etc. Those things are happening fast and at a massive scale. We adopted an event-driven system to capture and process them asynchronously. We will have another blog post to deep dive into our event bus. For the Discovery use case, we have the following events in the event bus to start with:
- When livestream status changed to represent the latest livestream status
- When users performed certain actions such as bids, chats, giveaways, reactions, etc.
- When products got listed or purchased, when auctions started and ended, etc.
Events are disjoint datasets and they are just raw ingredients before transforming. From events, we distill and derive them into useful signals and features for retrieval and ranking. To meet the scalability and low latency requirement, we adopted two different models with their distinct pros and cons.
“Pull” model
This involves keeping all recent event data in memory by leveraging Rockset’s Converged Index which indexes all the fields in a single system that combines a row index, a columnar index and a search index. Because all the fields are indexed, pulling in disjoint datasets through low-latency joins is made possible when generating Home Feed. In this model, disjoint data sets don’t need to be denormalized when events happen. Also we can index batch features through S3/Snowflake integrations and make them accessible for retrieval when we generate the Home Feed.
We’ve built several collections on Rockset to power our real-time discovery stack:
- The latest state and real-time statistics of livestreams and product listings are maintained and updated in real time driven by events.
- Windowed user and livestream metrics at different time granular levels.
- Batch features such as various seller scores and embeddings.
This method offers ultimate flexibility for us to quickly experiment and build features from multiple real time datasets and serve the Home Feed generation. The problem with this method is that there is an extra cost when using join. If it takes too long, users’ Home Feed experience will suffer. We need to be mindful of how we structure our data for serving and how we can optimize the query to make it run more efficiently:
- Use field mapping to preserve the latest event to represent livestream state
- Use roll up to aggregate metrics at minute window to allow for quick access
- Use hint to explicitly provide join strategy (hash or lookup) and access path (index-filter or column scan)
“Push” model
Events are usually side effects or deltas of the state that we want to capture, store and index. When a certain event happens, this model pushes the change, enriches it and aggregates at different granularity and time windows. The denormalized entities go all the way to the indices that can be directly used by home feed retrieval without joining or ingesting as a source for the pull model.
- Events are enriched with batch features that generate from nightly jobs or joined with other events through streaming joins.
- Metrics/features are aggregated at different granularities and rolling time windows
- Finally, the denormalized data are loaded into a sink (such as ElasticSearch, Rockset, etc) for serving home feed generation.
We used this method to aggregate livestream statistics (such as distinct user count in certain actions) in 5 minutes rolling windows and provide them as features in ranking. Given the aggregated data is already denormalized, only point lookup is needed which makes the query pretty selective and extremely fast.
However, this method requires knowledge of what needs to be built and output to the sink. This makes it slower to make changes or experiment on different signals and configurations. Also when certain events have a large fan-out, we might encounter a backlog of events to be processed and the served data might be stale.
Revamped Backend
Our backend also went through a process of revamping to support a unified retrieval interface and moved to a configuration-driven system that allows for rapid development and experimentation.
The new unified retrieval interface decouples retrieval implementation from the Home Feed. It allows the backend to use different implementations at runtime. For example, in the livestream retrieval, we abstracted out a set of parameters (such as user id, followed categories, followed users, retrieval strategy, etc) as the interface that are implemented by both ElasticSearch DSL and Rockset QueryLambda. In normal Home Feed generation, different carousels can use different retrieval indices and in the event of service outage or timeout, we are capable of switching or automatically fallback to the alternative one.
The new configuration-driven system is built on top of the capability of feature flag and dynamic config. This system allows us to quickly change the model parameters or reference different implementations to iterate and enables us to run multiple configurations in parallel for different user groups for a/b testing.
Connect all the Dots in Real-time
Our backend now has access to the fast and fresh features and signals provided by both pull and push models, the next is to utilize them for scoring and retrieving livestream candidates along with our buyer and seller embeddings.
The retrieved candidates are sent to our ranking service that fully leverages user context, seller batch features as well as real-time livestream features, real-time user engagements to rank those candidates in different carousels against objectives when we generate a user’s Home Feed.
Summary: From 0 to 60 mph under 3 secs
With the real-time discovery stack built, we are eyeing to extend the embedding use cases to real time and integrate with more real-time features in the ranking. This will be extremely helpful for new users that don’t have batch signals yet as well as for existing users that they likely change their intentions in different sessions. It’s important to reflect the change in real-time.
Whatnot is moving uncomfortably fast! We built this whole realtime discovery stack in about 3 months with a lot of new pieces and revamped existing parts. We also know that we have a lot of refinement ahead of us, but we are excited to have it running, and look forward to unlocking the next chapters of discovery. Come join us!