Using PID controllers to diversify content types on home feed
Yaron Greif | Software Engineer, Homefeed Ranking
Every day millions of Pinners visit the home feed to find inspiration on Pinterest. As a member of the home feed ranking team, it’s my job to not only figure out what relevant pins to show Pinners but also to make sure that those Pins will help maintain the health of the overall Pinterest ecosystem. For instance, relative to ranking just for relevance, we might display more newly-created Pins to ensure our corpus doesn’t become stale, or more video Pins to surface actionable ideas from creators.
Traditional click-through prediction models are designed to maximize user engagement, but they don’t help achieve those other business objectives. To solve those other objectives, the home feed ranking team introduced controllable distribution, a flexible real-time system applied after the traditional ranking layer to control the tradeoff between areas like relevance, freshness, and creator goals by boosting and demoting the ranking scores of content types.
Before controllable distribution, we solved for those business constraints through a large number of special case solutions in the codebase. The two most common solutions were to simply insert the content we wanted more of approximately every n slots or to move the content up on the feed until a minimum percentage of the content returned is a particular type.
Those types of solutions were painful for both practical and theoretical reasons.
In practice, these hand-tuned boosts quickly became unmanageable and interfered with each other. And worse, they often stop working over time — especially when ranking models are updated. We regularly had to delay very promising new ranking models because they broke business constraints.
In theory, controlling content on a per-request basis is undesirable because it prevents personalization. If we show each user the same number of video Pins we can’t show more videos to people who really like to watch videos or vice versa.
Controllable distribution replaces those hard-coded constants with a system where business owners can specify a global target for the percentage of impressions by content type. For example, if 4% of the feed is set to video, controllable distribution can then automatically determine how to achieve that distribution while still respecting Pinner content preferences. Importantly, controllable distribution adjusts the system continuously in realtime, so it does not grow stale.
Controllable distribution does this through a system that tracks what percentage of the feed was video in the past and then boosts or demotes the content based on how close to the target video is. The boost is implemented by increasing the ranking systems score by a scalar that we call a “normalization constant.”
To motivate normalization constants we can formulate the Pinterest ranking setting as an optimization problem subject to constraints imposed by controllable distribution. The normalization constants are then the Lagrangians of that optimization problem.
For every user i slot j pair, the system selects pin Xij to maximize the ranking scores. Controllable distribution adds a constraint that every Pin type t should make up Pt percent of the feed
The optimization problem then becomes:
The Lagrangian form is then:
The Lagrangian λ are our normalization constant. From an economic perspective, the λ is the shadow price or acceptable opportunity cost to select a Pin of type t. We are willing to give up λ of expected engagement to show a Pin of type t.
The above optimization problem cannot be solved in practice because we don’t know in advance the set of Pins that will be ranked. Instead, without controllable distribution, the solution is approximated by greedily selecting Pins with the highest ranking score. Since λ for type t is independent of the user and slot, the decision rule above can be updated to select the Pin with the highest combined ranking score and normalization constants.
λ for type t is approximated by observing in real-time the error g(t)and adjusting λ accordingly.
For instance, in the below experiment we wanted the actual percentage of Pins of a certain type to be 15.5%. It started high, at 20%. When the system saw the content was being over distributed, it reduced the constant and eventually the percentage converged to about 15.5%.
We used a PID controller to find the normalization constants. PID controllers are used to control everyday systems like thermostats and cruise control. They have the desirable property that they don’t require a model of the problem space to work. Your thermostat doesn’t need to know whether your window is open to maintain the temperature in your house. Similarly, the distribution of ranking scores can change suddenly. So, it would be very hard to explicitly model the relationship between normalization constants and the distribution of content types.
Instead, PID controllers just use the recent history or errors between the target distribution and the actual distribution to update the normalization constants. These errors are easy to store in practice. The algorithm used is:
Intuitively, the i term is the most important term in the above equation. If there’s an error in how much content we show, we increase our normalization constant in proportion to i. The larger i is, the bigger our update. But if i is too big, we overshoot. So we need the p and d term to dampen our updates based on how quickly the error term diminishes.
The biggest problem we had with PID controllers is that content types with very large boosts required different p, i and d terms than content types with small boosts. But tuning p, i and d terms by content type is hard to maintain. Instead, we found that if we modified the PID control to work on the log space, the same PID controller worked for all content types.
A big advantage to using an online solution is that it can easily scale to handle AB testing experiences that don’t have past data. In contrast, offline solutions such as approximating λ using the previous weeks data have to wait for that data to be generated.
Error terms are tracked by having the Counter Service subscribe to a Kafka stream of frontend impressions and then storing the aggregates in RocksDB. The PID controller then reads the history of error terms from the Counter Service and publishes the resulting normalization constants in Zookeeper to be consumed by the selection algorithm. The PID controller is implemented as an hourly Jenkins job.
How it went
In many ways the solution was easier to implement than expected. It took less time than planned to create a workable PID controller. And the algorithm scaled easily to controlling multiple content types, which was a big worry during design review.
However, we often found we weren’t able to control content outside of a certain range. For instance, we’d be able to target any video percentage between 4% and 12%, but weren’t able to get video below or above that range. Debugging these types of issues was painful and tedious, and were generally caused by system issues or legacy hard coded constants.
Controllable distribution is used in production and has already proved to be very useful. It achieved the original use case of allowing models to be deployed without costly delays as they were updated to achieve business constraints. We have also used controllable distribution to deprecate legacy systems such as “periodic insertion,” simplifying the code and saving lots of engineering time.
We also found new use cases for controllable distribution beyond what was originally intended. For instance, we’ve used controllable distribution to test the impact of different loads of various content types on the system. Below we used controllable distribution to adjust the video load for different AB groups.
Traditionally Pinterest and similar sites have spent most of their modeling efforts on retrieval and event prediction. Controllable distribution is just one example of how it’s important to model and modify those ranking scores to deliver the best results for both Pinners and creators.
Pinterest will continue to invest in this post-ranking stage, which we call blending, to deliver better inspirational content to users.
Acknowledgements: We’d like to give special thanks to Ruimin Zhu, Kangnan Li, Nan Zhang, Cosmin Negruseri, and Ludek Cigler for their help on this project