To Explore or To Exploit: that’s the question

8 min readSep 26, 2021

Should I stay or should I go?

Exploration vs Exploitation is a trade-off that’s fundamental to nature: it’s the problem of learning and navigating through an uncertain environment that you do not have complete knowledge of. If you were to ask me the best way to have a great job for the next three decades, I would not have a great answer for you. Considering the sheer complexity of the system whose parameters we are trying to adapt to, and the unpredictability tied to it, there would be no golden solution. In contrast, if you were to ask me when the moon would rise tomorrow, I would have a definite answer: an analytical solution, because I have a pretty darn good model of a system for the phenomenon we are trying to observe and is nowhere as combinatorically complex.

To take the hunter gatherer context to frame the tradeoff: we have always faced a constant tension between choosing to stay in our spot of land whose fruit yield we know and depend on (and risk losing out should the tree die, or a fire sweep through), or moving to a new location that might have better bounty but also run the risk of discovering a barren piece of land or a pack of wolves. This classic trade-off is described in foraging behavior of animals, as they weigh moving themselves and their tribes from patch to patch.

https://en.wikipedia.org/wiki/Foraging#Learning

But this trade-off at its core is one of risk and knowledge management: the inherent tension between exploring the unknown and exploiting the known is so inescapable and fractal in nature that we rediscover and represent motifs of it in a variety of fields.

Mythology and Psychoanalysis

The chaos and order characterized in Nietzschean philosophy embodies this trade-off: there is no perfect solution, it’s a mirage. The only real is the tension between the chaos of the unknown, and the order of the known, and the interplay that allows you to learn, adapt and grow in an unknown dynamic environment. Take the story of the Apollonian and the Dionysian for instance — Apollo is the god of the sun, of rational thinking and order, and appeals to logic, prudence and purity. Dionysus is the god of wine and dance, of irrationality and chaos, and appeals to emotions and instincts.

The Ancient Greeks did not consider the two Gods — with seemingly opposite ideals — to be opposites or rivals; instead, they were oft entwined by nature. For Nietzsche, the world of mind and order on one side, and passion and chaos on the other, formed principles that were fundamental to the Greek culture: the Apollonian a dreaming state, full of illusions; and Dionysian a state of intoxication, representing the liberations of instinct and dissolution of boundaries. Nietzsche saw their fusion as ideal, symbolizing the chaos inherent in a creative process, moving from one stable order to the next:

I say unto you: one must still have chaos in oneself to be able to give birth to a dancing star. I say unto you: you still have chaos in yourselves.
Friedrich Nietzsche

Drawing a dotted line analogy to Freud’s psychonalyses several decades later: the Id alone cannot lead to long term growth, it is too stochastic, too focused on instant-gratification. The superego alone with its internalized cultural rules cannot support the creativity needed for long term growth — it is too rigid and conservative in a changing environment. The zone of conscious learning happens as the Ego seeks to balance the impulses of the Id and the rigid demands of the Superego: an internal conflict we are all burdened with. As Yin-Yang generalizes: exploration-exploitation is a dualistic-monism, a fruitful paradox.

But do we actually behave this way? And if so, do machines modeled to solve human tasks like board games face a similar trade-off in their working?

Re-inforcement learning: machine agents and humans

AlphaGo, the AI that defeated 8 -time Go champion Lee Sedol, used reinforcement learning behind the scenes: an AI technique for artificial agents seeking to maximize some notion of a reward while navigating an environment (like points in a game of Go). Computing all possible combinations to find the next best move forward can be computationally intractable (if not downright impossible), and the sheer amount of information processing power required can make such brute-force attempts impractical. Instead, reinforcement learning agents nudge the environment (or the game space), incorporating feedback from the environment into its model of the world. And in dance with its reality, the agent would inevitably have to tune its risk-appetite on the exploration-exploitation continuum as it chooses how risky a bet the next move should be. (More: https://en.wikipedia.org/wiki/Monte_Carlo_tree_search#Exploration_and_exploitation)

AlphaGo vs 18 -time Go champion Lee Sedol (https://en.wikipedia.org/wiki/AlphaGo_versus_Lee_Sedol)

Moving from the artificial to natural, we find that the human brain, just like other animals, deals with this trade-off (https://pubmed.ncbi.nlm.nih.gov/17395573/). In fact, new and emerging evidence hints at pure-correlates: specific parts of the brain have evolved to play sides, to counterbalance. (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5825268/)

But what about groups of agents?

Beyond individuals: organizations and societies as learning agents

Clayton Christensen’s (of Harvard Business School) popular theory of disruptive innovation describes why new entrants in a field disrupt old ones: companies fall trap to the innovator’s dilemma, where companies are not sure if they ought to continue exploiting their current product lines, or risk cannibalizing their sales by exploring new ones. Ford was on an “exploit-heavy” model until competitive forces from the EV market proved enough of a risk that changed their appetite towards cannibalizing an established product line (Mustang) by a potential internal disruptor (Mustang EV) in an attempt to re-invent itself. Large established companies tend to be risk-averse and bias towards exploitation of existing revenue streams, and fall prey to the Dionysian Silicon Valley startup.

Indeed, in organizational theory, an organization that masters balancing the duality of managing today’s business with the innovation and adaptation for the future is called an ambidextrous one.

The reason why it is so difficult for existing firms to capitalize on disruptive innovations is that their processes and their business model that make them good at the existing business actually make them bad at competing for the disruption.
Clayton M. Christensen

Expanding further to societies, here is my boldest claim in this thoughtstream: democracies are learning agents at a societal level. Given any single policy, the population at large would fall onto a spectrum of progressive (who are for change), and conservatives (who are for status quo). This is not to be confused with the ideological positions of Progressive and Conservative, which are often centered around a personal portfolio of policies and values. The constant tension between these cohorts when considering a portfolio of policies is the necessary tension enabling healthy democracies to make risk-appropriate choices, navigate an ever-changing environment, and weed out bad ideas and policies from good ones. Naturally, this implies there cannot be a perfect solution: a progressive only strategy will tend to become delusional as it drifts away from being grounded in the knowledge of past exploration, moving faster than reality and tending to depart from it by redefining operating reality itself (in a baudrillardian sense). A conservative-only solution will die out because it cannot adapt by exploring new adaptive solutions, since it continues to rest on past laurels in the face of a changing environment with new adaptive and selective pressures and can tend to depart reality in the opposite direction — by discounting the present and assuming representations of a past reality as a valid representation of today.

Knowledge as Equity, Growth, and Risk Appetite

Implicitly visible across these domains is the natural lifecycle of an organism (or more generally a learning agent): as represented by the equity of operational knowledge it possesses at a given point in time, and how compounding of that know-how and other factors could determine the risk appetite chosen against the explore-exploit trade-off. More in perhaps a future thoughtstream, but a quick example to highlight this: in humans, the sex hormones testosterone and estradiol are often linked with risk taking behavior, and they tend to rise sharply right after puberty (when we’ve had sufficient basic developmental grounding to venture out and take risks for growth).

https://www.bmj.com/content/335/7633/1320/F1

However, they tend to plateau and decline after middle age, potentially reflecting the more risk-averse outlook we take, in light of being able to exploit the wisdom we’ve acquired (akin to financial equity we try to accumulate.)

Products and companies typically go through a similar life cycle — from the early crawling stages of finding a market and footing, through growth maturity and eventual decline — and typically have different risk appetites in different stages.

https://hbr.org/1965/11/exploit-the-product-life-cycle

Naturally, these stages in the lifecycle of a company also make certain CEO profiles more aligned with the objectives of the company in a particular stage, who in turn adopt different risk profiles against the exploration exploitation trade-off their organizations face. Aswath Damodaran (NYU Professor of Finance) points this out eloquently:

http://people.stern.nyu.edu/adamodar/pdfiles/country/corporatelifecycleLongX.pdf

Parting thoughts

This generalizable notion of exploration-exploitation being an inevitable trade-off in the face of changing and uncertain environments was terrifying at first, since it seemed to posit that most problems in the real world do not have a perfect solution and highlights the constant struggle that accompanies learning and growing.

However, in a paradoxical way, it is liberating to know that this struggle, and the humility associated with realizing that one cannot know everything there is to know to live perfectly across time, is part of nature itself.

To Explore or To Exploit: that’s the question

Parting thoughts

Further Reading

Written by must go faster