Explore vs Exploit

Miguel Angel Pasalodos

If you had to pick a restaurant, would you go to a place you have been to before? Or one you haven’t tried?

When I have posed this question to friends, the answers have typically been — it depends. Depending on the day, occasion, or one’s mood, one may choose to go with a new place (i.e. the explore option), or settle for a tried and tested one (i.e. the exploit option, and I will explain this in a bit). There are occasionally those that are strongly for one of the choices. There are the serial ‘explorers’, who will never settle for anything. And there are the serial ‘exploiters’, who will go with the tried-and-tested option than expose themselves to an uncertain one. As with most debates, those at the extremes of one or the other choice have a hard time understanding each other.

Choices like these are something we encounter every day whether it is food, groceries, books, news articles, travel destinations, types of TV shows and movies, or hobbies. The terms explore and exploit come to us from the recommendation engines — algorithms that power book recommendations on Amazon and movies on Netflix, among others — where to ‘exploit’ is to suggest more of the same, and to ‘explore’ is to recommend something entirely new. We may believe that we divide our time equally between exploring new things, or doing more of the same. If this were true, new products introduced into the market would have much higher adoption on a very short time scale. We are creatures of habit. We need to be nudged, told, barraged with advertising, and sometimes we need to be in the right frame of mind to be willing to explore.

Our resistance to explore is not without reason. My personal experience with exploration concerns bread — I have never been able to settle on a type or brand. As a consequence, I often find myself stumped at the bread aisle in supermarkets trying to decide between wheat, white, 3/5/7- multi-grain, rye, pumper-nickel, and the different brands and prices, and as I am wasting mind-cycles trying to decide, someone usually walks in, picks a bread, loads it in their cart and walks away. It is often far more difficult to explore than to exploit. One expends too much mental capital trying to decide, depleting any resources one might use for the other more important decisions in life. There is also the anticipated risk of disappointment. What if our choice turns out to be worse than what you had before? It is often easier to stick to something one has already tried before. There are exceptions, like the serial explorers I mentioned before, and also situations where one might be more likely to weigh the options than go with a prior choice (e.g. high ticket items like plane tickets where price may trump loyalty). For most trivial tasks such as reading news or watching a movie or buying a $10 book, we are likely to prefer things that are in line with past interests. Very soon, one is caught in a web of one’s most common preferences. The world of choices shrinks to one’s history of past choices. We have worked our way into a bubble.

This narrowing down of choices is particularly severe in the online world than in the physical one. Most online recommendation engines are trained to capture signals based on one’s preferences to suggest recommendations that are often exploitative i.e. similar to past preferences, that is often referred to as being ‘personalized’. Unlike a grocery store where the aisles don’t change miraculously when one walks in to show only those items one has shopped before and one has the opportunity to explore if one wants to, an online experience is less conducive to exploration. True, there is the search box where one can search for whatever one wants, but how does one know what else is out there to search? My news feed is tuned to topics and news outlets I have read and engaged with in the past. How is the engine that powers recommendations to know that on a whim, I might me more open to reading the counter-point even if I don’t agree with it. So also with movies, what if I feel absolutely whimsical and tired of existing choices to want to try out a new genre or type that I have never picked before?

A purely explorative recommendation engine may fail in comparison with the exploitative approach. If Amazon were to recommend us a random set of books, we would likely not buy as much as what we do now. Similarly with news. Screen real-estate is valuable, and most web companies will use it to show something which you are most likely to click or buy, and by that logic the exploitative model trumps the explorative one. The recent fiasco with fake news recommendations is a classic example of how this exploitative approach can go wrong.

There is a case to be made for giving users the freedom to explore even if it is not their most common preference. It is a lot like the question we started with — we may often go with a tried-and-tested option, but we know that we always have the choice to go with something entirely new. We may not do it very often, but it is still important to have the choice. Choices have a redeeming quality of freeing us from the narrow confines of our past experiences. In being largely exploitative, online recommendations keep us from having this option, sometimes with disastrous results, for the sake of clicks and dimes.