What does AlphaGo vs Lee Sedol tell us about the interaction between humans and intelligent systems?

Extrapolating a few moves during the game to derive a few learnings in a broader business context.

I watched the documentary about the match with opposed AlphaGo and Lee Sedol this weekend. For those who missed it, AlphaGo is an algorithm designed by Deepmind to play the game of Go and Lee Sedol is one of the best Go player of all time. I think the documentary is a great piece, which tells us lot about how we humans perceive and react to the decisions made by AI systems.

In this post, I’ll try to extrapolate humans’ reactions during this game to a broader business context. This is just one interpretation but I thought this might be a good occasion to i) reflect on how we apprehend decisions made by AI systems and ii) learn how we can improve ML systems by fostering and learning from their interactions with humans.

We’re Point Nine Capital a VC firm focused on SaaS, marketplaces, Crypto and AI companies. If you want to be kept informed of our new posts you can subscribe to our content newsletter.

0. A quick intro to AlphaGo

I will not dive too deep into the science behind AlphaGo. If you want to learn more, here’s the paper.

Yet, for the purpose of this post, four different aspects are important:

  • AlphaGo initially learned from thousands of games played by professional human players.
  • It decides on the next move based on a probability to win associated with each of the possible moves (using Monte Carlo simulations).
  • AlphaGo improves by playing against itself (using reinforcement learning).
  • AlphaGo uses probabilistic methods, which are based on a representation of the world (a distribution of potential outcomes). As computing power grew exponentially, computers have become better than human at computation by many orders of magnitude. That said, the game of Go is so complex and the array of possible moves is so wide that AlphaGo cannot consider all the different possible game situations. When AlphaGo started playing against Lee Sedol, there were still “trajectories” or “game situations” that AlphaGo had not experienced. Trajectories in which AlphaGo could make suboptimal decisions (more on that below).

At the end of the game, AlphaGo won by 4 against 1. I think this games illustrates both the power of machine learning systems as well as the one of human intuition. Here’s why.

I. Game 2, Move 37: We don’t trust the machine when decisions appear silly/creative although they optimize for the end results

In game 2, AlphaGo’s 19th move or move 37 of the whole game appeared like “creative” to the commentators. As DeepMind mentions on their blog, move 37 was “so surprising they overturned hundreds of years of received wisdom”.

Any “normal” human, which would have tried to maximize the number of points short to mid term instead of maximising the probability to win at the end of the game, would have made a different move. Said differently, AlphaGo favoured a scenario “where it will win by 1 and a half points with 99 percent probability over a situation where it will win by 20 points with 80 percent probability”. Why? Because of its amazing capabilities to foresee a situation 50 to 60 moves after.

Let’s try to apply that in another context. Let’s take an autonomous vehicle foreseeing an accident to happen. What if the system actually predicted that the probability to avoid the accident was actually higher if it accelerated rather than stopped? Are we ready to accept such a decision? The question is complex also because it’s almost impossible to explain the drivers of the decisions made by deep learning systems. Therefore, we might just need to blindly trust algorithms and their decisions.

And, now, what if, despite the machine optimizing for avoiding the accident, it still happened? Are we still ready to blindly trust the algorithm?

I think this is a very interesting example of how machines behave differently than humans while optimizing for a counterintuitive but in the end more “optimal” result. We’ll likely see more and more of such situations in the future as more intelligent systems integrate in our day to day environment. It’ll be interesting to understand how our judgment about such decisions evolves. In some cases, we might just not pay any attention. For example, we might not wonder how an autopilot for greenhouses makes decisions on the right temperature to set. In others, like the car example, the decision might be highly controversial and the integration of such systems in our environment might be all the more difficult.

As an investor, one of the learnings here is also that trying to assess our acceptance of decisions made by intelligent systems and/or especially our acceptance of failures is one of the key aspects when assessing an AI startup. It’s all the more important that it’s a great proxy to assess the potential speed at which a startup can “go to market”.

A pretty good summary ;)

Why does it take so much time for autonomous cars to hit the road? Because the cost of a single mistake is gigantic; people die in case a mistake happen. As a seed investor, investing in an AI company with a very high cost of failure is hence riskier because it’ll likely take more time and more capital to go to market. But, don’t get me wrong, we’re still ready to take that risk when the payoffs are high :)

If you want to dig deeper in this topic, our friends at Zetta VP wrote a great post called ”Product Payoffs in Machine Learning”.

II. Game 4, Move 78: When human creativity defeats the machine’s computational power

After losing 3 games in a row, Lee Sidol came back to the table and defeated AlphaGo. The 78th move of the game is said to be the cornerstone of his victory. Commentators have called it “the divine move” (“Brilliant tesuji”). The designers of AlphaGO have called it “the one in ten thousands move” because AlphaGo had calculated that there was a probability of one out of ten thousands that a human would play this move. It’s all the more interesting that this move led AlphaGo to make suboptimal decisions in the next rounds. Indeed, AlphaGo’s next ten moves triggered a sharp decrease in its probability to win the game. It actually fell from 70% to below 50% and AlphaGo never managed to go above 50% after.

Lee Sedol’s strategy during this fourth game was to force an “all or nothing” situation instead of trying to gain points by small increments. His idea was that AlphaGo was superior at doing the right moves to optimize for small gains thanks to its capability to compute very accurately the probability to win at any point of time but that it could fail in extreme situations where the gain of points in one move would be more important.

One way to understand this more broadly is that computers are intrinsically much better than humans at computing probabilities. They can thus can make better decisions when it’s about assessing all the different possibilities of an environment. But one way for the human to beat the machine is to change the environment. In this case, it was about playing such an unexpected move that the computer did not even consider it a possibility. This seems to be a good illustration of a situation where creativity defeats computational power.

Interestingly, AlphaGo’s defeat in this game shed some light on some of the flaws of Monte Carlo methods. Now that they’re identified, designers of AlphaGo can work on redesigning a better algorithm taking into account these flaws of Monte Carlo methods discovered thanks to Lee Sedol’s “Brilliant Tesuji”.

III. How human decisions and intelligent systems benefit from each other

These two moves are great illustrations of how humans interact with intelligent systems. They also raise two interesting learnings that we might want to consider in a broader context:

A. Intelligence Augmentation

Intelligent systems can sometimes make better decisions than humans because their computation capabilities enable them to optimize for an end outcome that humans can most often not foresee. But this becomes all the more interesting in a context where AI agents don’t make decisions themselves but also help humans make the right decisions. This is what people call “Intelligence Augmentation”.

After Deep Blue defeated Gary Kasparov in 1997, people wondered whether or not a combination of AI+Human could actually beat a human first and a machine second. In 2005, they made the experiment and found that a Human+AI beats the solo human. Not very surprising. But — amazingly — they also found that a Human+AI beats the solo computer.

Nicky Case provides a great explanation in a recent post”:

This is because, contrary to unscientific internet IQ tests on clickbait websites, intelligence is not a single dimension. The “g factor”, also known as “general intelligence”, only accounts for 30–50% of an individual’s performance on different cognitive tasks. So while it is an important dimension, it’s not the only dimension. For example, human grandmasters are good at long-term chess strategy, but poor at seeing ahead for millions of possible moves — while the reverse is true for chess-playing AIs.”

Isn’t this the same story as the one of the 78th move in Game 4 in AlphaGo vs. Lee Sedol?

Nicky Case again:

“And because humans & AIs are strong on different dimensions, together, as a centaur, they can beat out solo humans and computers alike.”

Integrating AI agents in our day to day process could help us improve our decision making in many different contexts or industries. Emergence Capital calls this trend “The Coaching Cloud” (here is their post in case you’re interested).

B. Human intuition helps designing better AI systems

The second learning from AlphaGo vs. Lee Sedols is that human intuition can reveal some algorithms’ inherent flaws. Not only because they were not fed by large enough datasets but also because the design of the algorithm itself had flaws.

Lee Sedols shed light on some of the flaws of using Monte Carlo tree search to play Go. A flaw that the Deepmind team has probably tackled since then.

C. A quick graph about (AI or Human) Intelligence Augmentation

Here’s a small graph to sum up these two learnings:

Better designed algorithms lead to better performances. Better performances help human make better decisions. Humans who make better decisions can create situations where algorithm’s fail (the “divine moves”). Learning about these failures helps us design better intelligent systems!

If you’re working on augmenting our (my) or AI system’s intelligence, please send me an email. I’d love to chat!

A big thanks to Christoph Janz and Robin Dechant who reviewed very (too) early drafts of this post! My english is always clearer after you read my drafts ;)

If you’d like to be notified of our next posts you can subscribe to our newsletter.