Introducing AI2-THOR 2.0

Bridging the Reality Gap in Simulated Environments with Game Design

Winson Han
Jul 19, 2019 · 4 min read

Training AI systems to visually interact in the real world is challenging and expensive; building robots, and the environments to let them explore and learn (and break things!), is beyond the reach of most researchers interested in investigating computer vision tasks in realistic spaces. A large variety of maneuvers and tasks are not possible for any currently available robot without significant additional development, and setting up and maintaining multiple environments in which to test and refine robot behavior is usually not scalable due to the space, time, and money required.

To solve this problem, the PRIOR team at AI2 built AI2-THOR — an open source visual AI platform that provides near-photorealistic environments complete with actionable objects. AI2-THOR is based on Unity 3D Game Engine, which enables physical simulation for objects and scenes, and provides a Python API to interact directly with Unity.

AI2-THOR Framework in the Unity 3D Game Engine

There are many choices when it comes to picking which simulated framework is best suited to your particular task’s needs. One option might be a framework that uses techniques like photogrammetry to create highly detailed scans of actual houses and objects. Another might be a framework that imports large 3D model databases to procedurally populate a virtual environment. These options are great for creating very large realistic environments ideal for more straightforward tasks like Navigation or Object Recognition, however, they often lack more in-depth object focused interactions beyond basic physics collisions (if physics are simulated at all).

AI2-THOR 2.0

True to its name (The House Of inteRactions), our newest 2.0 release of the AI2-THOR Framework introduces complex physical properties and object state changes that allow agents a deep level of interaction within the virtual environment they are in. We wanted to allow agents to not only exist in a simulated environment but to truly change it—to observe common sense cause and effect and to have a playground to run around in instead of just a room full of static, unchanging walls and objects.

A Focus on Interactions

The environments released in version 2.0 are filled with hundreds of actionable objects, each with their own plethora of interactions:

Examples of different states and interactions of objects in THOR

Drawers, cabinets, refrigerators and shower curtains can be opened and closed. Appliances like toasters and microwaves can be turned on and used to change the temperature or even cook other objects. Water faucets and showerheads can spew water that puts out lit candles. Light switches control the ceiling lights in the room. Beds can be tidied up if the covers are messy. Laptop and phone screens break if they are dropped onto a hard surface. Basketballs will bounce differently if dropped on a tiled floor or a soft carpet. Everything from large chairs to tiny spoons has a realistic mass and respects the laws of inertia and friction.

All of these are the sorts of interactions that sound mundane in regards to everyday human life but are rarely realistically simulated in a virtual training environment for computer vision. Causal state changes on a massive scale like that in AI2-THOR 2.0 is truly a new world of possibilities with regards to learning by interaction.

Game Design and AI, a Surprising Partnership

I mentioned above that it is rare to find such deep interaction systems in computer vision frameworks. This can lead to situations where an environment is as vast as an ocean but only as deep as a puddle.

Instead of making that puddle even wider, the PRIOR computer vision team at AI2 intends to make this puddle as deep as an ocean by integrating various features that will create more depth. But what sort of features can expand on the interactions provided in THOR? While rare in computer vision specialized software, features like an item spawn system, breakable objects, openable containers, cookable food, and destructible environments are all quite common features used in video game design. These sort of interactions are perfect for creating a sandbox that agents can interact with, as they are also used to create games that allow humans deep interaction in a virtual world.

Basketballs have realistic changes in bounciness depending on the surface they impact.

The Unity game engine is a powerful tool that allows realistic replications of real-life phenomena, but virtual worlds like that in our framework can struggle to flawlessly replicate reality. That “reality gap” when going from a simulation to the real world is a challenge that we want to take on. One approach the PRIOR team has taken is to use game design philosophies and techniques to build interactions in our framework. Since Unity is first and foremost made to create games, it is natural that a game design perspective can help with creating features and interactions, even if THOR itself is not a game at all.

By integrating expansive game-like systems into the THOR framework, we can design deep interactions reminiscent of those modern video games. Modern games make extensive use of new AI technologies. Things like creating character behavior or vast procedural scenes in your favorite games feel more real and expansive than ever before thanks to AI-assisted tools. The world of modern gaming has made many advances with the help of AI, so isn’t it about time that AI had some help from games?

Check out our demo, and try out the AI2-THOR 2.0 framework.

