Video Games might be the key to understanding Human Learning and getting closer to Human-like AI

Niranjan Rajesh
Bits and Neurons
Published in
8 min readMay 30, 2022

In the foreground of trying to solve the ultimate question of “general” intelligence and building machines that use this Artificial General Intelligence to make our lives easier, Tsividis et al. look for parts of the answer in Atari video games.

Some Context

Human-like intelligence in machines, or Artificial General Intelligence (AGI) has been one of the foundational goals of the interdisciplinary field of Cognitive Science. An intuitive approach involves understanding the intricate functioning of the human brain and encapsulating its essence and functionality into deployable algorithms. An alternate, machine-learning approach involves developing ‘models’ of the human mind that are tasked with solving specialised problems (like image classification and speech recognition) before optimising them to approach human levels of performance by feeding vast amounts of data into networks. Most computer scientists try to answer the problem using the latter approach. In the paper titled “Human Learning in Atari”, Tsividis et al. compare humans learning to play Atari games to ML algorithms trained to play the same games. The work promises to shed light on how machine learning currently differs from human learning which will in turn, help bring us closer to designing human-like AI agents.

In the past, algorithms have been developed to reach and surpass human-level performance in Atari games. These algorithms use Deep Reinforcement Learning (DRL) to achieve its results, which is a fairly new combination of AI technology. These algorithms employ an artificial structure called neural networks that are a collection of neuron-like nodes that mimic the is inspired by the organisation and functioning of the human brain’s neurons. The ‘deep’ in DRL refers to the large number of layers in the neural network which mimic the structure of the brain. The ‘reinforcement’ refers to how the network is treated based on the outcome (e.g. rewarded for reaching a target outcome and penalised for the wrong one). Read this article for a vastly more thorough explanation of deep reinforcement learning. These algorithms, given enough time, can reach and exceed human-level complex decision making problems. That is one of its problems right there — the time it takes to learn.

Human Learning vs DRL algorithms

In the particular case of Atari games, DRL algorithms take hundreds of hours of gameplay data to reach human levels of performance whereas humans are able to learn to play these games in a matter of minutes. This implies a significant difference in how the learning takes place in the algorithms compared to humans. The ‘representations’ of the knowledge accumulated at each layer of the neural network in the algorithm seem to be quite different to those of humans. Highlighting these differences may be the key to getting closer to agents that learn just as rapidly and generally as humans.

It is also important to note that comparing the actual learning times of humans and ML algorithms will be futile and unfair as the ML algorithm needs to learn and develop a visual processing system, a decision making system and a general understanding of how spatiotemporal objects can be interacted with. Humans, on the other hand, come readily prepared with all of these systems and more before playing a game for the first time. For this reason, learning rate comparisons are a more fair option to compare the two agents.

A plot that compares the rate of skill acquisition in 3 Atari Video games between a human and a machine learning algorithm.
The figures shows the vastly superior learning rate of humans in 3 different Atari video games compared to a machine learning algorithm (DDQN — Double Deep Q Network).

Why do humans learn better?

Why is it that humans learn more effectively than the best video game playing machine learning algorithms out there? The potential reasons humans are naturally rapid learners in this context can be condensed to two major hypotheses:

  1. Humans have knowledge about the world they live in which is useful as the games mostly take place in similar worlds. Holding prior information like doors can be opened and traversed through, food can be eaten to improve wellbeing, sharp objects hurt and reduce wellbeing, etc. can be useful in games that use these interactions. This information is readily available to humans but not for ML agents.
  2. Humans come equipped with more general priors like being aware of the existence of objects as physical entities whose properties can be learned from experience, an intrinsic curiosity to explore these object to observe their properties and build theories about their workings . These theories help build a model of the game environment and any internal interactions. Such priors are not available to ML agents that start learning from virtually nothing.
Screenshot of the atari game frostbite
A screenshot of the Atari 2600 game — Frostbite via indieretronews.com

Verifying these hypotheses

Tsividis et al. went about verifying these hypotheses by manipulating experimental procedure of the gameplay of Frostbite in the following ways:

  1. To test whether specific prior knowledge about objects in the game had an impact on how quickly humans learned, the researchers created a blurred version of the game. In this version of the game, the objects present in the scene cannot be recognised, thus hiding its apparent properties available to the players.
  2. To test whether general priors and the development of a theory-led model of the game environment was at credit here, the researchers found ways to help players learn these theories quicker. This involved making the players read the instruction manual or watch an expert play the game beforehand. By exposing the players to information regarding the occurring phenomena in the game, the intent was to flush out the theory-led model of the game in the player’s mind that will result in increased early-game performance.

The results below favour the hypothesis that general priors and the innate development of a theory-led model of the game environment is to credit rapid human learning in video games.

Bar graph of first-episode score for player who played the following variations of the game: normal, blurred, instructions and observations.
The chart shows that game object knowledge is less impactful in rapid human learning compared to instructions and observation induced theory learning of the game environments. The chart measures performance using the first level scores of Frostbite.

It turns out possession of specific knowledge of object identity are minimally useful to players. Tsividis et al. attribute this to the fact that actual game interactions of objects may not be clear from its physical world counterpart. For example, an animal could either be helpful or harmful in the physical world (it can be eaten or it might attack you). Therefore, if the same animal was involved in the video game, the result of an interaction with that animal is not confidently evident to the player.

Reading the instruction manual and observing an expert play the game beforehand did provide players with a significant higher score. This suggests that the players were able to make use of this additional information given to them to improve their theories and build a more accurate model of how the game environment works in their mind which was used to plan their actions and play the game successfully.

In their insightful paper, Tsividis et al. established the significance of human intuitive theories and how they can be generalised from limited prior information and can be used to plan their approach to solve a problem they face. How can this theory-building aid machine learning?

Improving the learning of AI

Experimentation with Atari games have given us a key characteristic that likely plays a part in making human learning so rapid with minimal previous experience. ML algorithms lack what is ingrained in human minds that prompt us to explore and create models of the environment in question that will help them generate plans to start solving the problem. These strong priors might be the key difference absent in ML implementations that keep them distant from true human-like learning.

How might this be implemented in ML?

The next big step towards Artificial General Intelligence seems to be embedding these strong priors to build theories about the environment into the ML algorithms in question.

Besides these intuitive priors, understanding of world objects may actually be important to equip general AI with, unlike the findings of Tsividis et. al. The results might be a reflection of the fact that video games are a unique case where world objects do not always interact in non-obscure or predictable games. These games don’t necessarily require world knowledge from the player for them to succeed as the games might be more marketable and interesting to play if they are unique and non-traditional. Take the example of Pokémon, it deviates greatly from typical world knowledge of small animal-like creatures with comical abilities and made to battle one another by humans who capture them. Real world knowledge of animals will not benefit the performance of players who play the game. However, outside video games in the real world, knowledge of the object’s priorities beforehand will help in solving a problem related to it. Let’s illustrate this in an example application of real-world general AI.

GeneralHouseBot: An example from the near future?

Let us take the example of a robot that helps with household tasks (cleaning rooms, cooking the houseowners’ favourite food, paying their utilities bill, etc.) with programmed human-like learning and intelligence (AGI). The housebot is expected to tend to any household tasks that may exist when the owner is not at home. For the sake of argumentation, let’s say that the house receives a sealed, packaged box through the mail for the first ever time. The housebot does not have any previous experience with any situations involving mail in the past. The housebot is programmed to take care of any tasks that it will encounter, what would it do now?

a cleaning robot
via personalrobots.biz

The housebot should use its intrinsic curiosity of programmed priors to approach the box and recognise it as an interactable object. Experimenting with actions available to the housebot (picking it up, pushing it around, shaking it etc.) will help build a model in the bot’s computational centre about the box’s properties. These inferred properties might suggest that there is an object inside the box. This is where world knowledge also plays an important role. Knowing the fact that objects like gifts and sealed mail exist in the world will help the housebot label the object in front of it as such based on its properties. This will prompt the housebot to carefully unseal the package and pick up its contents and show them to the houseowner thus terminating the task. If there was no world knowledge embedded in the bot, the contents of the package may have not been handed to the houseowner as intended.

The presence of the combination of human-like priors and world knowledge was key for the housebot to extend its general intelligence to a new, unencountered task of dealing with mail. This example was heavily abstracted for the sake of argumentation but could easily be extended to more accurate applications of AI in the near future. The insights from the paper produced by Tsividis et al. highlight the importance of human priors that could be transferred to ML algorithms in order to gain human-like general learning abilities.

References:

--

--

Niranjan Rajesh
Bits and Neurons

Hey! I am a student at Ashoka interested in the intersection of computation and cognition. I write my thoughts on cool concepts and papers from this field.