AI2-THOR is AI2's open-source interactive environment for training and testing embodied AI. We’re pleased to announce the 2.7.0 release of AI2-THOR, which contains several performance enhancements that can provide dramatic reductions in training time. This release introduces improvements to the IPC system between Unity/Python, serialization/deserialization format, and new actions that provide better control of the metadata. Dig into the details below, or jump to our TL;DR at the bottom to grab the update command.
AI2-THOR uses the Unity game engine to simulate the environments we create. In order for Python to communicate with Unity, a server is created that the Unity Player process connects to in order to communicate the state of the environment to the Python process.
With the latest release, the FIFOServer backend replaces the legacy HTTPServer(WSGIServer)/JSON backend. To understand why this is significant it can help to understand the relationship between Unity and Python components of AI2-THOR. When the AI2-THOR Controller is launched, a server is launched for Unity to communicate camera parameters (depth, RGB, segmentation, etc.) and metadata about the scene after an action is performed by the agent.
Once an action has completed, a component within Unity will collect the RGB frame from the camera, and metadata about each object and agent within the scene. The metadata (note the RGB frame is not encoded) is then serialized to JSON (legacy backend) and then the entire payload is sent to Python over HTTP. During performance analysis, both the JSON serialization/deserialization and socket IO were identified as bottlenecks. To address these bottlenecks, the serialization format was switched from JSON to MsgPack and the WSGIServer was replaced with a Named pipe server along with a purpose-built protocol to handle the payload. MsgPack was chosen for several reasons: extremely fast serialization/deserialization in both Python and C# (Unity), robust/mature libraries, schema-less. Due to being schema-less, migration from the JSON format to MsgPack format was easy to validate as they both generate identical data structures that could be compared against each other during development. Using MsgPack we found that the serialized metadata size was reduced by 50%, serialization time was reduced by 40% and deserialization time was reduced by 60%. Named pipes were chosen for similar reasons, but the primary reason was speed. With small payloads (< 128 bytes), we observed performance reaching 100k messages per second (~10μs per message). During testing of the WSGIServer, we could only achieve around 1k messages per second with a similarly sized small payload. Overall, we have observed (depending on scene and type of action being performed) between a 1.5x to 2x increase in FPS just by switching to the FIFOServer.
After each action in AI2-THOR is performed an enormous amount of metadata is generated about the scene. Each object has the following information collected:
Once collected for each object in the scene, these are serialized and sent over the pipe to the Python controller for AI2-THOR. (For more details on the meaning of any of the properties please consult the documentation).
For tasks such as PointNav or ObjectNav, you may only care about zero or one object in the scene during an episode. To better support this use case a new action was added: SetObjectFilter.
This limits the metadata to only include objects that have been explicitly specified. To remove the filter, simply call:
We have observed increases of 50% in FPS when using this filter, but this will vary depending on the number of objects in the scene and the types of actions being performed.
A common action type in AI2-THOR is retrieving state about the environment, but not manipulating it. An example of this type of action is GetReachablePositions. This action queries the environment for all the locations in a scene that an agent can move to. Actions such as these shouldn’t require the scene to be fully rendered or all the metadata regenerated, since nothing has changed as a result of the action. Prior to the new FastActionEmit feature, this type of action was much slower than it should have been. With FastActionEmit, only a small metadata patch is sent to the Python process instead of an RGB frame + full metadata payload. In the case of GetReachablePositions, we observed a 2.5x increase in FPS just by enabling FastActionEmit. With 2.7.0 FastActionEmit is enabled by default. We are still rolling this out to the remaining actions that can benefit from this, so look forward to many other actions speeding up.
AllenAct is a modular learning framework, which uses AI2-THOR for a number of different tasks including PointNav and ObjectNav. We use the ObjectNav implementation in AllenAct to benchmark the new release of AI2-THOR. For this purpose, we use a single AWS p2.8xlarge machine that has 8 GPUs and 16 CPU cores and runs 60 instances of AI2-THOR with a frame resolution of 400x300. With the old WSGI interface, we are able to train the task of ObjectNav in RoboTHOR at a speed of roughly 220 fps and are able to reach an SPL of 0.08 in about 9 hours of training. However, with the new FIFOServer and SetObjectFilter, we see a dramatic speed increase to 600 fps, a 2.7x speedup, enabling us to reach an SPL of 0.15 in 9 hours of training.
AI2-THOR is faster than ever with the 2.7.0 release and provides several ways for you to dramatically reduce your training time. Installation is easy:
pip install ai2thor --upgrade