Is Optimizing for Engagement Changing Us?

Luke Thorburn
Understanding Recommenders
12 min readOct 10, 2022


Luke Thorburn, Jonathan Stray, Priyanjana Bengani

Of the many potential harms of recommendation algorithms, perhaps the most unsettling is the idea that they might be changing us — our opinions, preferences, beliefs, and behaviors — in ways we don’t understand. Our concern here is not marketing or propaganda, but the strange ways that automated systems can affect us in their algorithmic pursuit of maximization. For example, many are familiar with the idea of a social media “rabbit hole”, in which the algorithm nudges users towards believing content that is increasingly extreme. One explanation for such rabbit holes is that users with radical beliefs are more likely to engage, and because the recommender has been coded to optimize for engagement, it tries to change the users so that they become more radical.

To date, such influence has been shown to occur in simulations. At a high level, these simulations compare how preferences in a population evolve when people are exposed to a recommender system, with a baseline preference trajectory that occurs “naturally”, without the possibility of influence by the recommender. Under some assumptions, these studies have demonstrated that certain types of recommenders can significantly influence the ways in which preferences evolve over time.

Examples of simulations that compare baseline preferences in a population over time with the changes in those preferences that occur when the population is exposed to a recommender (adapted from here and here).

Simplified models suggest that influence is possible, but is it actually happening? The effects of recommenders are hard to measure, and we leave the complex question of whether social media recommenders are changing political preferences to a future post. But there are sources which suggest influence is already taking place. For example the following comment, made by a YouTube employee developing reinforcement learning-based recommender systems, seems to describe a type of influence effect.

We actually do observe this really interesting user learning process as well during our experiments … there is this growing impact of improvement during even just a 2–3 week period. We see this move from a neutral-ish improvement towards a bigger gain, where the user is actually also adapting to this new behavior exhibited by the system.

So if influence is already happening, how can we avoid it? We can’t simply tell the algorithm to “prevent all changes” — to do so would prevent opportunities for positive change such as education or new hobbies. Nor can we give a list of every type of preference change we want it to avoid. This problem has been well explored in the context of robotics. Imagine programming a robot to get coffee as fast as possible. You run it, and in its haste it steps on the cat. You add code to tell it not to hurt pets or people, but next time it breaks a vase. You add code penalizing changes in its environment, but then it tries to prevent you from washing the dishes. Analogously, there are countless ways a recommender could influence people in undesirable ways and we cannot articulate them all, nor do we know how to formulate simple general prohibitions.

In this post we tackle the challenges of defining, detecting, and preventing “manipulation” by increasingly intelligent recommender algorithms. We’ll explain how this accidental influence results from a combination of mis-specified objectives and feedback loops, then dig into the ethical challenge of deciding when influence is good (e.g. “education”) versus bad (e.g. “blackmail”). Finally we summarize the best thinking to date on how we might detect undesirable influence, and how to prevent it from happening.

Accidental Influence Through Mis-specified Objectives (and all objectives are mis-specified)

Recommenders, like almost all AI systems, operate by maximizing some objective or reward, usually some form of engagement. This works to the extent that it actually aligns with what users want. But there is a risk that the algorithm might instead learn to change the preferences of users so that they are easier to satisfy. For example, a recommender might learn that outrage reliably produces clicks, and therefore develop strategies to keep people outraged.

Like any rational entity, the algorithm learns how to modify the state of its environment — in this case, the user’s mind — in order to maximize its own reward.

Stuart Russell, Human Compatible (2019)

We’ll use the word “influence” as a value-neutral way to talk about the idea that recommenders might be changing people. This covers a lot of ground: advertisers influence consumers by placing ads in printed newspapers, teachers influence their students through education, and politicians influence citizens by giving speeches and interviews. We think a good definition for talking about influence in recommender systems is:

The act of changing a person’s context-specific distribution of beliefs or behavior.

In this definition, behavior is what people do — their “external state” — and beliefs is used as a catch-all for their “internal state” — any aspect of the individual that isn’t expressed externally, including preferences, opinions, emotions and beliefs of fact. We use the phrase “context-specific distribution …” to acknowledge the fact that internal states, such as preferences, can be uncertain, inconsistent and constructed on the fly.

There are three criteria which, in the context of recommender systems, define the particular type of “accidental” influence we are concerned about here.

First, we are only talking about influence that is unintended. While there’s a lot to say about data-driven persuasion methods such as the psychographic microtargeting used by Cambridge Analytica — both the political implications and the question of their actual effectiveness — in this piece we’re concerned with the influence that is not intended by any of the humans who build or interact with the system. This excludes, for example, claims of platform designers explicitly biasing algorithms in political directions.

Influence can occur as a side effect of the recommendation algorithm pursuing its objective, which is to optimize a measure of value (the value model) by showing items to the user. This optimization pressure creates a strong incentive for the recommender system to take any actions or strategies available to it that increase its measure of value, including influencing its users, even if such influence was never intended by the system’s designer.

We care which means the algorithm used to solve the problem, but we only told it about the ends, so it didn’t know not to “cheat”.

Krueger et al., Hidden Incentives for Auto-Induced Distributional Shift (2020)

Second, we are only talking about influence that requires a feedback loop between the recommender system and the user. By this we mean the recommender changes in response to the user, and the user changes in response to the recommender. This interactive loop is what makes recommender influence different from reading a book, and from non-personalized search engines.

Some types of recommenders cannot produce these feedback loops. For example, the recommenders for Reddit and The Factual do not change their strategy in response to user behavior. But feedback loops can be found, to varying degrees, in most modern recommenders because they are based on engagement prediction, and their predictions are regularly updated using recent user behavior. Feedback loops may turn out to be a particular problem for a new generation of recommenders based on reinforcement learning. In such systems, an “agent” (the recommender) is constantly learning and adjusting its strategy by observing the effect its actions have on the “environment” (which includes the user).

Feedback loops can also arise in more subtle ways, such as through A/B tests. Platform designers may expose two groups of users to different versions of a recommender system, and then adopt the version that most improves their measure of value. If they do this repeatedly, then the recommender will be changing in a way that is informed by the effect it has on users. The designers themselves become the means by which the recommender system changes in response to user behavior, completing the feedback loop. This kind of feedback loop is not personalized to each user, and indeed personalization is not required for influence to occur. A news video channel is an example of a “recommender” that is constantly learning and adapting to a population of users, but not personalizing for each member of the audience. All of these techniques can influence the population as a whole, but don’t have a fine-grained ability to influence individual users.

Concretely, what would it look like for a recommender to try to influence you? It might simply show you content that it wants you to engage with. In general, all of us tend to become more aligned with the content we are exposed to — whether through experience effects, anchoring, learning new facts, discovering new interests, pressure to conform, or the illusory truth effect. Alternatively, a recommender might show you content you find strongly disagreeable in order to more firmly entrench your current preferences. It might show you articles suggesting that everyone else is interested in a particular topic, to generate FOMO and increase the chance you engage. It might show you a lot of conflicting news accounts to generate reality apathy, then feed you a false conspiracy theory which makes it all make sense. This list is not meant to be exhaustive. There will be many more subtle ways in which a sufficiently capable recommender could influence us, including some that we would not be able to foresee. The above examples simply demonstrate that recommenders could plausibly influence us, if they are able to learn how to do so.

The third and final criteria is that we are only talking about influence that is unethical.

When is Influence Bad?

It can be tempting to claim that all influence is unethical. At the same time, almost everything we think and do is a response to our environment, and in that sense has been influenced by the information to which we are exposed. Some forms of influence (e.g. education, good advice) are usually positive, but others (e.g. gaslighting, blackmail) are not.

If we are to prevent recommenders from influencing us in unethical ways, we need to draw a line between ethical and unethical influence that is widely-accepted — to determine our collective “preference-change preferences”. There are at least two resources that may help.

The first resource is the study of influence, persuasion, manipulation, and related topics, which goes back thousands of years. For example, the Stanford Encyclopedia of Philosophy characterizes manipulation in three ways: as the bypassing of reason, as trickery, or as the application of pressure. Literature on journalism and media ethics highlights particular forms of unethical influence, such as the Werther effect. Jowett & O’Donnell discuss the differences between propaganda and persuasion, arguing that persuasion — unlike propaganda — is “interactive and attempts to satisfy the needs of both persuader and persuadee”. This kind of theory can be used as a basis for defining unethical influence in the context of recommender systems.

The second resource is the pool of existing laws and regulatory frameworks that record our previous judgments about manipulation and influence, and hence set a precedent. In the US, for example, there are Federal Trade Commission rules about truth in advertising and product endorsements, Federal Communications Commision rules about consumer deception, and public affairs office-holders within the Department of Defense are required to be truthful. Such regulations do not cover all forms of unethical influence, but a practical approach may be to start by identifying a subset of unethical influence, and then refine the definition over time.

While there is a wealth of previous thinking about unwanted influence, encoding these ideas computationally is a major challenge. More than building each of these existing rules into a system, the challenge is to design systems that can flexibly interpret these principles and extend them to new situations. There is some work using language models to make simple moral judgements, but we are a long way away from nuanced evaluations of the acceptability of influence.

Detecting and Preventing Unwanted Influence

Assuming we can clearly define the types of influence we are concerned about, we are faced with two technical questions: how to detect it, and how to prevent it.


There are two broad approaches to detecting influence in general. The first is to observe how people’s behavior changes over time and, via a model that connects beliefs to behavior, make causal inferences about whether their beliefs are changing due to exposure to the recommender system. The second is to directly ask people what their beliefs (or preferences, or opinions) are, and again try to infer whether such changes are caused by the recommender.

Both approaches have their limitations. Behavior does not always reflect beliefs, and there are multiple incompatible ways of modeling the link between beliefs and behavior. It is not clear which, if any, is correct. Moreover, doing causal inference to determine the effects of recommenders is challenging. The effects of recommender-caused influence may be small, or only affect a small number of people in specific circumstances, and thus be hard to detect.

In practice, it may be more plausible to detect whether a group or population of recommender users is being influenced (such as by becoming more polarized), rather than to detect influence on an individual basis. Alternatively, it may be practical to simply classify certain preference shifts as undesirable, regardless of whether they were caused by the recommender. This removes the challenges associated with causal inference, but means you may take action against changes that were not caused by the recommender, raising other ethical issues.


Say we are clear on the philosophical and ethical questions, have measured the amount of bad influence that is taking place and think we need to reduce it. Or perhaps we want to mitigate the risk of bad influence preemptively. What could we do?

There is a growing literature on how to prevent or mitigate the effects of influence in recommender systems. Broadly, proposals fall into three groups.

The first set of proposals are to avoid giving recommenders the capability to influence. Concretely, this could mean that recommenders get tested in simulated environments, and only those that don’t demonstrate the ability to influence get deployed. It could mean that recommenders are designed to only observe the short term effects of their actions, which would remove their ability to learn to manipulate preferences over the long term. More simply, it could mean that recommenders are just not used in sensitive contexts, such as when Facebook decided to stop recommending political groups or Twitter banned political advertising.

The second set of proposals amount to making influence more difficult. This could be achieved by rewarding the recommender for being uncertain about what it is the user actually wants, which could lead it to be more conservative in its recommendation strategies and avoid pursuing strategies if it is uncertain whether the user would approve of them. (For example, this would effectively stop the recommender from reading too much into our actions when we click occasionally on a clickbait headline.) Another approach would be to mandate a certain level of randomness or noise in the recommendations — perhaps by interleaving a sample of randomly selected posts into users’ feeds — which would make any attempt at influence by the recommender less effective.

The third set of proposals explicitly penalize influence, to remove the incentive for recommenders to cause it. Of the three, this is the most difficult direction, because it requires a formal definition of which types of influence are unwanted, and the ability to distinguish between changes caused by the recommender and changes caused by other factors. Moreover, as in the example of the robot barista in the introduction, it may not be possible to articulate every possible form of unwanted influence. A conservative strategy is to define categories of “safe” changes — such as those expected to occur in the absence of the recommender — and to flag all changes that don’t fall into these categories. Alternatively, it may be possible to learn which changes are undesirable in a bottom up fashion, by asking users. For example, the YouTube recommender could learn to avoid regret, as measured by survey responses. Another approach attempts to ensure the recommender has no incentive to increase its reward by influencing the user.


Inadvertent influence may not just be an issue for recommender systems on large platforms with the potential for steering a population, but also for small platforms if sufficiently harmful to a small number of users. Further, the strange effects of strong optimization may increasingly become an issue as more of the content we are exposed to becomes personalized, and perhaps synthetically generated to maximize an objective (à la GPT-3 or DALL-E 2). To retain human agency, we need to be able to define, detect, and prevent such influence from happening. These are challenging problems and all require significant research attention, but there are promising precedents and directions on which we can build.

Thank you to David Krueger for feedback on this post.

Luke Thorburn was supported in part by UK Research and Innovation [grant number EP/S023356/1], in the UKRI Centre for Doctoral Training in Safe and Trusted Artificial Intelligence (, King’s College London.



Luke Thorburn
Understanding Recommenders

PhD candidate in safe and trusted AI at King’s College London.