The Big Blind: staring down AI

Published in

Professor Rose Luckin’s EDUCATE

4 min readMar 27, 2020

A map showing the progression of AI gaming system milestones

Last year, a new AI bot called Pluribus, developed jointly by Facebook and Carnegie Mellon University, beat top human poker players at their own game for the first time ever. Carmel Kent, Research Mentor at EDUCATE, asks of AI: ‘have humans been defeated now?’

We have seen AI bots defeat humans in the past, in games of Chess, Go and even in poker. But Pluribus’ victory in a six-player, no-limit Texas Hold’em game signals a whole new development.

This was a multi-player game, requiring the use of hidden information and bluffing. It also required very little computing power.

Most AI gaming advancement until now came in two-player games — or zero-sum games, where one wins and the other loses — but real life is much more complicated than that. Unlike the strategies used by the developers of Deep Blue and Alpha Go, an AI poker bot cannot rely on heavy calculations to provide a complete set of information on each point. It would need to use many strategies, such as bluffing and uncovering others’ bluffs to navigate and win the game.

What the makers of Pluribus have done more efficiently and cheaply that some of its predecessors is to evaluate its options only a few moves ahead at a time, rather than search its moves exhaustively to the end of the game. This has made it more adaptable, more efficient and better applied to real-life situations than, for example, Alpha Go or Deep Blue.

“It is crucial to remember that, until now, AI has only been able to surpass human intelligence in an untransferable, narrow context of very structured games”

Darren Elias, four-time poker world champion speculated: “The first time the AI wins is the last time the human will ever win”, adding, “I’ve done nothing but play poker since I was 16 years old and dedicated my life to it, so it’s very humbling to be beaten by a machine”.

Are human gamers to be the first to be professionally replaced by AI — and if so, what does this mean for the rest of us?

It is crucial to remember that, until now, AI has only been able to surpass human intelligence in an untransferable, narrow context of very structured games. And despite the pace of technological development, the fundamental questions around AI’s abilities have changed much in 50 years, when Marvin Lee Minsky, an American AI scientist predicted in 1970: “In from three to eight years we will have a machine with the general intelligence of an average human being”.

His forecast, as we know, failed to materialise, and since then we have gained a greater understanding of AI’s potential. For us to be better equipped to develop and flourish in a world of AI’s quick wins, we must gain a deeper understanding of the difference between human and artificial intelligence.

We must base our decision on what we know to be the unfair advantage of being human, a skill that is unique to us and how this compares with technology’s unfair advantage. AI technologies such as Go, image classification, speech recognition, handwriting transcription and digital assistants are all challenges tailored to the unfair advantage of AI over us by involving effective search, pattern recognition, automating repetitive tasks and probabilities’ manipulations.

“when you see champion poker players like Darren Elias beaten by Pluribus what you are really watching is us losing in an unfair game”

In contrast, our unfair human advantages include meta learning, multi and interdisciplinary academic intelligence, social and meta-cognitive intelligences and perceived self-efficacy. So, when you see champion poker players like Darren Elias beaten by Pluribus what you are really watching is us losing in an unfair game — a human playing a machine with an unfair advantage.

Pluribus won because of its ability to bluff using randomised actions. We humans know how to bluff, but our actions can become predictable and our strategy easy for a machine to uncover. Pluribus also used self-play in which the bot plays against copies of itself, without any humanly trained data. Put simply, its algorithm reflected on its past moves using a mechanism called counterfactual regret minimisation. So, the probability of actions in which the algorithm retrospectively regrets its moves is minimised in the future.

A perfect human self-regulated learning mechanism.

Perhaps we do have something to learn from AI after all?

Author: Dr Carmel Kent, Senior Research Fellow

Originally published August 2019 by EDUCATE

The Big Blind: staring down AI

Written by EVR