What if everything was about reward prediction error?

A note on how our lives would look like if we could perceive joy only by our errors in predicting rewards.

Alireza Modirshanechi
The Startup
7 min readDec 29, 2020

--

WARNING: This article is not supposed to be scientifically precise. Rather, it is a playful discussion about an imaginary scenario inspired by scientific findings

Dopamine, enjoyment, and reward prediction error

It is generally believed that dopamine is what makes us feel happy and pleased, and there is scientific evidence for the release of dopamine at the time of accomplishing something or receiving rewards. However, many studies have shown that most of the neurons responsible for the release of dopamine, usually called dopaminergic neurons, are not triggered by the reward itself; rather, they are triggered by the difference between the achieved reward and the expected reward, what is called “reward prediction error” in models of reinforcement learning¹.

It is somehow funny; it means that no matter how big the reward is, if you expect it, then there is no release of dopamine according to these neurons. Or even worse, you may accomplish something and get a reward, but because you might have had a higher expectation, the actual value of the reward gives you a feeling of disappointment and leads to a decrease in the dopamine release, due to the inhibition of the dopaminergic neurons. In other words, these neurons are triggered only if the achieved reward is greater than what we have expected, i.e. only if the reward prediction error is positive².

I do not intend to talk about neuroscience, biology, or reinforcement learning in this article. Rather, I would like to take the mentioned scientific facts as inspiration, imagine a simplistic world where the reward prediction error is the only reason for joy, and discuss what enjoyment is like in such a world.

“Imagine all the people” enjoying reward prediction errors³

Imagine an abstract simplistic world where (1) people perceive joy and pleasure only through the release of dopamine, and (2) dopamine release depends only on reward prediction errors. Such a world is quite different from the complex one we live in; in reality, neither dopamine is the only source of our happiness or our feeling of success, nor reward prediction error is the only trigger of dopamine. However, thinking about this abstract imaginary world simplifies the problem of “what do we enjoy?” and lets us discuss it with less distraction by other potentially relevant factors.

Following this thought process, we may get some interesting intuitions about our more complex real world. To do so, in what follows, I try to imagine and describe a few features of this simplistic imaginary world.

1. Joy is in the eye of the beholder

When the only thing that matters is the difference between the achieved and the expected rewards, then the expected reward is as important as the achieved reward in quantifying how enjoyable our experiences are. For example, if we increase both the expected and the achieved rewards by the same amount, then the reward prediction error, and hence the corresponding joy (or disappointment), does not change.

More intuitively, if you take a person and suddenly increase their income by a huge amount but also significantly increase their expectations about their day-to-day life, then you cannot be sure that you have made their life "happier". To make them have a happier life, assuming their income plays a role in their happiness, you need to increase their income faster than how they increase their expectations! With similar reasoning, there is no guarantee that rich people who are used to their luxurious lives enjoy the world more than the average.

In summary, in a world where everything is about reward prediction error, everything also takes up a more relative nature.

2. Always being right is boring

When the assumption is that we enjoy positive reward prediction errors, then, by assumption, there is no enjoyment in a world without errors. Hence, if there is no uncertainty in our environment, and if we can make objective predictions (i.e. without any subjective bias), then life becomes quite boring. For example, many people would agree that a big part of the joy in winning a game comes from the fact that winning is not certain in advance, and I believe such a certainty makes many of us less motivated to even get involved in that game.

Another interesting but a bit counter-intuitive example is lotteries; in general, since you do not expect much from a lottery, losing does not hurt much while winning is magnificent. But what happens if you know with certainty that you will win? In the imaginary world that we are discussing here, there is not any feeling of accomplishment at the moment of seeing your card numbers on the TV screen; yet because receiving a huge amount of money leads to lots of other activities with their own corresponding reward prediction errors, winning a lottery can still be quite fun later on and in the long term! Hence, in our imaginary world, the whole experience is indeed still pleasant, but I doubt that if, even in our real and complex world, the moment of announcing the winner of a lottery, when he or she has been known to us in advance and with certainty, gives us any special feeling.

In summary, to have some fun and joy or, on the other hand, to have some disappointment and frustration, we need to be wrong, at least once in a while.

3. Pessimism rocks?

If joy is in the eye of the beholder, then, no matter what, we can make our lives joyful by decreasing our expectations. One who always expects the worst-case scenario is the one who always experiences a positive⁴ reward prediction error. In other words, to always enjoy the world, you should always be wrong with your expectations, but wrong in the right way!

However, it is not always easy to pick the worst-case scenario as our expectation, simply because many of our expectations are made unconsciously, particularly in the situations that we encounter quite often. Hence, for many of these situations, if we want to be pessimistic, we need to train ourselves; we need to stop adapting our expectations according to our environment and to stop being realistic. For example, if you are rich, you should stop getting used to being rich! To do so, you should always imagine bad scenarios more than good ones — like some kind of mental training. For example, if you are rich, you should repeatedly imagine huge drops in stock markets; then, even a stable trend looks amazing!

The problem is that if we think about imagination as a simulated version of the future, then for each imagined scenario we have also an imaginary reward prediction error, and to keep our expectations low through this mental training, we need to imagine more negative reward prediction errors than positive ones. The irony is at this point: if we want to enjoy more in reality, then we need to suffer more in our minds!⁵ Exactly the opposite holds true for optimism.

In summary, in non-familiar situations, a pessimistic mindset seems to be the most fortunate one. However, for day-to-day life events, pessimism rocks only if we can stand a catastrophic image of the world in our minds.

So what? (Traditionally called conclusion)

It is obvious that, for example, we can eat the same food many times in our life and enjoy it every single time. It is also not difficult to imagine (or to remember) enjoying watching a movie multiple times. It is indeed the case that there are activities that we can enjoy while perfectly matching our expectations, activities like running, biking, or swimming.

It is clear that reward matters on its own, and it is clear that not everything is about reward prediction error. However, as we saw, this very simple assumption can explain our feelings and emotions for a wide range of situations in our day-to-day life. More importantly, because of its simplicity, it is easy to use this assumption to find better and novel insights about "what makes us happy?" and "why does it make us happy?".

In the examples above, I find it fascinating that the necessity of a faster increase in income compared to the increase in expectations, the unintuitive nature of certain lotteries, and the pros and cons of pessimism can all be discussed elaborately in this imaginary world. Thinking about these situations in this imaginary world made me see things from different perspectives; it was like solving math problems while only a small handful of mathematical tools are allowed.

The insights and perspectives resulting from thinking about this imaginary world can become a useful part of our reasoning for making our future decisions; however, we should always be aware that we are not living in that simplistic world!

Acknowledgment

I am grateful to Vasiliki Liakoni, Johanni Brea, Bernd Illing, Marco Lehmann, Mohammad Tinati, and Siavash Dadpour for our fruitful and enjoyable discussions on related topics. I am grateful to Vasiliki Liakoni and Siavash Dadpour for their very helpful feedback on the text.

Footnotes:

¹ The relation between the activity of dopaminergic neurons and dopamine signals with reward and reward prediction error has been intensively studied in computational and system neuroscience. An interested reader can take a look at Schultz, et al. (1997) to see how this story started, and see for example Kim, et al. (2020) and Dabney, et al. (2020), as two among many other fascinating studies, for some very recent advances on the topic.

² Formally, the reward prediction error is defined as the achieved reward minus the expected reward, where both the achieved and the expected rewards can take any real value — positive or negative. A negative reward value can be seen as a punishment.

³ See the appendix!

⁴ More precisely, it should be a non-negative reward prediction error.

⁵ For example, as a researcher, for every single paper of yours getting accepted, you should imagine a few rejections to balance your expectation!

Appendix!

The quote in the title of the section is from John Lennon’s famous song: Imagine, 2010.

--

--

Alireza Modirshanechi
The Startup

Postdoc at Helmholtz Munich and MPI for Biological Cybernetics; Ph.D. in CS from EPFL; Personal website: https://sites.google.com/view/modirsha