Introducing “Ghost in the Minecraft”: A Generally Capable Agent in Minecraft

Jifeng Dai
5 min readMay 27, 2023

--

From AlphaGo in the game of Go to AlphaStar in StarCraft II, and then OpenAI Five in Dota 2, breakthroughs have been made in the research of super-intelligent agents in increasingly realistic and open virtual environments. Now, our multi-talented AI agent “Ghost in the Minecraft” (GITM) is capable of mastering the game “Minecraft”!

In the best-selling game “Minecraft,” we can engage in various activities such as survival, exploration, and creation, all closely simulating the real world. Many renowned research teams, including DeepMind and OpenAI, have devoted themselves to the study of AI agents, hoping to find answers to the real world within the game.

We have proposed a multi-talented AI agent called Ghost in the Minecraft (GITM) that can autonomously learn and solve tasks. GITM not only outperforms all previous agents in “Minecraft” but also significantly reduces training costs. This research marks an important step toward achieving General Artificial Intelligence (AGI). AGI aims to develop intelligent agents capable of perceiving, understanding, and interacting with the open-world environment, which can lead to significant breakthroughs and advancements in industries such as robotics and autonomous driving, further promoting the development of AI technology.

The intelligent agent is able to fully unlock 262 items in the technology tree of the Overworld in “Minecraft” (compared to a total of only 78 items unlocked by all previous agents, including OpenAI and DeepMind). It achieves a substantial increase of 47.5% in success rate on the standard “ObtainDiamond” task (from 20% with OpenAI’s VPT method to 67.5% with GITM). Moreover, the agent only requires two days of training on a single CPU node, reducing the training steps for interacting with the environment to one ten-thousandth of previous methods. This is much lower than the 6,480 GPU days required by OpenAI’s VPT method or the 17 GPU days required by DeepMind’s DreamerV3.

Project Page: https://github.com/OpenGVLab/GITM

AI is now capable of survival, exploration, and creation in an open world, just like humans!

The multi-talented AI agent “Ghost in the Minecraft” (GITM) masters the game “Minecraft” in survival mode, starting from scratch and acquiring all items in the main world, including mining diamonds and crafting enchanted books!

“Ghost in the Minecraft” (GITM)
Successfully crafting enchanted books, the highest-level products in the Overworld’s technology tree.
Mining diamonds is no longer a challenge.
GITM handles various terrains, environments, day-night cycles, and even encounters with monsters with ease.

Why “Minecraft”?

In current AI research, there is an increasing focus on developing AI agents with versatile abilities. These agents are expected to master a wide range of skills, adapt to various environmental changes, and more deeply simulate and address human capabilities in complex problems.

In the global best-selling game “Minecraft,” we can see various activities such as survival, exploration, and creation, all closely simulating the real world. “Minecraft” is like a miniature version of the real world. Researchers aim to develop an AI agent capable of overcoming all technical challenges within “Minecraft,” thereby moving towards building a General Artificial Intelligence that can autonomously learn and master skills in the entire real world.

However, AI agents in “Minecraft” face an interesting Moravec’s paradox:

Tasks relatively difficult for humans, such as playing chess, are relatively easy for AI, while tasks that are simpler for humans in open worlds like “Minecraft,” involving interaction with the environment, planning, and decision-making, pose enormous challenges for AI.

GITM successfully breaks free from the limitations of this paradox and achieves significant breakthroughs in complex and realistic environments. This opens up new possibilities for advancing AI technology and building more general AI agents.

How powerful is GITM?

Broad task coverage: GITM achieves 100% task coverage in all technical challenges within the Overworld of “Minecraft,” unlocking the complete technology tree. In contrast, the sum of all previous agents could only cover 30%.

High task success rate: In the highly anticipated “ObtainDiamond” task, GITM achieves a success rate of 67.5%, a +47.5% improvement compared to the current best performance (OpenAI’s VPT).

Exceptional training efficiency: Surprisingly, GITM achieves new heights in training efficiency. The number of environment interaction steps required is only one ten-thousandth of previous methods, and it can be trained in just two days using a single CPU node. This is a tremendous improvement compared to the 6,480 GPU days required by OpenAI’s VPT or the 17 GPU days required by DeepMind’s DreamerV3.

How was GITM built?

Traditional RL agents face difficulties in mapping highly complex tasks to low-level keyboard and mouse operations.

GITM breaks away from the traditional RL-based architecture and adopts a new paradigm with a large-scale language model (LLM) as the core of the intelligent agent.

GITM consists primarily of three parts: LLM Decomposer, LLM Planner, and LLM Interface. It gradually decomposes the complex goal into sub-goals and structured actions until reaching the lowest level of keyboard and mouse operations:

  • LLM Decomposer utilizes external knowledge, such as game knowledge databases on the Internet, to decompose the complex goal into simpler sub-goals.
  • LLM Planner plans a series of structured actions for each sub-goal and adjusts the planning based on feedback. It can also improve itself by continuously summarizing successful experiences.
  • LLM Interface executes structured actions using low-level keyboard and mouse operations and obtains observations during interaction with the environment.

Advanced Applications of GITM

GITM can further be applied to more complex tasks within “Minecraft,” such as building Shelter, Farmland, Iron Golem for survival, creating Redstone Circuit for automated devices, and building Nether Portal for entering the Nether. These tasks demonstrate the powerful capabilities and scalability of GITM, enabling the agent to survive and explore more advanced worlds within “Minecraft.”

--

--