DeepMind’s Gato: A step towards General AI

4 min readMar 9, 2023

DeepMind is a leading artificial intelligence (AI) research company that was founded in London in 2010. It is well known by all of us for having created an AI system that beat Gary Kasparov the master of the chess game. It followed it’s reign by creating AlphaGo, the first AI system to defeat a human champion at the ancient Chinese game of Go. Recently, it has come up with a unique model named “Gato”, a significant departure from traditional AI models, which are typically designed to perform specific tasks within narrow domains, using specialized networks.

What’s so special about Gato?

This model from the DeepMind AI can perform several tasks like playing videogames, answering questions, captioning images, all simultaneously. Gato is a multi-tasking model that is designed to work across different modalities (e.g., text, images, robot arm movements) and can perform a wide range of tasks, from playing Atari games to stacking blocks with a real robot arm.

Gato, works as a multi-modal, multi-task, multi-embodiment generalist policy. The same network with the same weights can play Atari, caption images, chat, stack blocks with a real robot arm and much more, deciding based on its context whether to output text, joint torques, button presses, or other tokens. One can chat with he model, ask for recommendations, make it explain those recommendations. It can perform more than 600 tasks.

What is General AI?

General AI, also known as artificial general intelligence (AGI), refers to the hypothetical development of an AI system that can perform any intellectual task that a human can do, the development of one true algorithm that can do it all. This means that a general AI would be capable of not only performing a wide range of tasks, but also of adapting to new tasks and environments that it has not been specifically designed for.

In contrast to specialized AI systems, which are designed to perform specific tasks within narrow domains, a general AI would have broad and flexible capabilities that could be applied across many different domains. This would require a high degree of flexibility, adaptability, and creativity, as well as the ability to learn and reason in a way that is similar to human intelligence.

The development of general AI is considered by many experts to be a major technological challenge, as it requires overcoming a number of fundamental obstacles, including the development of sophisticated machine learning algorithms, the creation of flexible and modular architectures, and the development of novel approaches to reasoning, planning, and problem-solving.

How is GATO able to do it?

To achieve this level of generality, GATO uses a neural network architecture that includes shared layers for feature extraction across different modalities and task-specific layers for learning the specific patterns and relationships of each task. The network may also incorporate attention mechanisms to selectively attend to different parts of the input depending on the context and task at hand.

In addition, Gato leverages advances in large-scale language modelling to enable it to generate text outputs when appropriate, such as in chat applications or image captioning. This may involve using pre-trained language models or fine-tuning them on specific tasks.

The key idea behind Gato is to create a single generalist agent that can learn to perform a wide range of tasks across different modalities, using the same network and weights, and deciding which output to produce based on the context and task requirements. This approach has the potential to significantly reduce the complexity and cost of building and maintaining specialized agents for each task and modality.

How good is it at these tasks?

The model “GATO” is at least half as good as the human expert in 450 out of the 600 tasks, and as good as the human expert in the quarter of these tasks which is mind blowing. What is more amazing about it is yet it does not need 600 different techniques to solve these tasks. Thereis just one generalist AI that does it all

In conclusion, DeepMind’s Gato represents a significant step forward in the development of artificial general intelligence. By enabling a single network to perform a wide range of tasks across different modalities, Gato has the potential to reduce the complexity and cost of building specialized agents for each task and modality.

While the development of general AI remains a significant challenge, models like Gato provide exciting new avenues for research and development in this field. As we continue to explore the possibilities of AI, models like Gato will play a critical role in shaping the future of intelligent machines.

DeepMind’s Gato: A step towards General AI

What’s so special about Gato?

What is General AI?

How is GATO able to do it?

How good is it at these tasks?

Written by Rukaiya Bano