Paper Implementation of ‘Using Unity to Help Solve Intelligence:’ Code Walkthrough

Published in

aureliantactics

8 min readJan 27, 2022

In this post I’ll detail my code, process, and next steps for implementing the DeepMind paper “Using Unity to Help Solve Intelligence.” See my last post, and the diagram below for summaries of what I tried to implement.

From the paper: https://arxiv.org/abs/2011.09294

Code Walkthrough

Starting with outside the black box in the diagram above, is the purple ‘Agent’ box. This is represented by the Python client example in the repo. This file starts a container based on the Docker image generated by the repo Dockerfile. The key parts:

With the env object you can connect any agent/learner that uses the dm_env format and train a Reinforcement Learning (RL) agent. For example something like Acme.

The black box itself is built into the image by building the Dockerfile. The Unity editor produces a an executable, stand-alone file which I put in the app directory. The Dockerfile builds the image from the contents of the app directory.

Moving inside the black box, let’s start with the communication layer. dm_env_rpc handles the gRPC connection. The repos is well documented and easy to work with. The repo contains guides, overviews, implementation instructions, and a Python example. dm_env_rpc provides the tools to turn the Docker container into a regular old env that RL libraries are used to interfacing with.

The rest of the code is inside the Unity editor. The code modified in the editor was built into the executable in the app directory which was then built into the Docker image. Based on dm_env_rpc repo, I compiled the proto files into C# code and added them to the Scripts directory (DmEnvRpc.cs, DmEnvRpcGrpc.cs). The file DmEnvRpcImpl.cs implements functionality from those proto files. DmEnvRpcImpl.cs takes the incoming requests from the Python client, adds them to the RequestQueue (see Session layer in the diagram) and then awaits a response from the environment. The key part of the code:

Up next is the Session layer. I’m not sure if I structured this code correctly between the RequestQueue, WorldTimeManager, and AgentSession. The RequestQueue.cs takes the request, unpacks the information, and adds it to an internal queue. Then, as you can see from the code above, the communication layer is waiting on the response from the agentSession.HandleEnvironmentRequest() method. In AgentSession.cs, the AgentSession grabs the next item in the RequestQueue. Depending on the request type, AgentSession sends or receives information from the interface layer (the Avatar and Tasks in the diagram above).

Here’s an example of handling a Create World request. In this request type, the client asks the server to create the environment and sends in some configuration settings.

AgentSession takes in the request, reads from the Tensor, and in this example tells the Interface layer’s Task to start the environment with these configuration settings (using the StartEnv() method). Each request has its own response type.

Moving from the Session layer to the Interface layer we have Avatar.cs and BlackBoxRLTask.cs (renamed from the diagram’s Task due to naming conflicts). The Task handles the reward and episode done information. Task has functions AgentSession can access for starting and resetting the environment. The Avatar handles observations and moving things in the environment and has functions for inputting actions received from the communication layer and getting observations to send back out to the communication layer. The Avatar and Task then interact with BasicController.cs to move the objects around the game world (in this case the blue box and green spheres in the image below). The BasicController.cs is lightly modified from the Unity ML Agents repo.

https://github.com/Unity-Technologies/ml-agents/blob/main/docs/Learning-Environment-Examples.md#basic

My Process

When I turn a research paper into code, I like to re-read the paper with that purpose in mind. I try to make a rough outline of the steps that need to be taken, make notes on particular areas of interest or concern, and make a list of questions of things I don’t understand.

In this case, the paper made things a bit easier by including some code examples of an example Docker container, an example dm_env_rpc client, and some scraps of Unity code. I also looked online to see if there were any other implementations or repos using this set up. Some helpful examples were the dm_env_rpc repo, Alchemy repo, and DeepMind memory tasks. However none of the examples contained the internal C# Unity code, rather they had the Dockerized black boxes.

The Windows Prototype

The paper breaks itself into three main areas: the interface layer, the session layer and the communication layer. I started with what I had experience with, making a Unity environment on Windows, which meant focusing on the interface and session layers. My example env is the Basic environment from the Unity ML package with some modifications. In this example, the agent can either go a short distance left and get a small reward or go a longer distance to the right and get a large reward.

I made C# scripts, added a bunch of comments with ideas and to do’s, and tried to fit the various scripts together as best I could. I made these into their own namespace with the hope that eventually I could turn this into a stand-alone package at some point.

After setting up the rough framework, I began learning gRPC. While I was somewhat familiar with the protobuf format for sending data, I was not familiar with using gRPC to send protobufs between various applications. I spent some time working through the official tutorials in Python, C#, and Unity and found an additional Unity tutorial that was helpful.

The key parts here were creating a Python client that interacts with a Unity server. Creating the Python client was relatively straightforward. There were plenty of examples. Plus I only needed a bare-bones Python client for testing since at some point I would integrate dm_env_rpc to provide advanced functionality. I was able to get a Python client to Python server, a C# client to C# server, a Unity client to Unity server, and a Unity client to Python server up relatively easily. The first challenging part was setting up a dynamic Unity server.

Part of the challenge is that the gRPC for Unity package is experimental. It is unclear if Unity and gRPC will work well together in the future, and none of the examples I found had the Unity server used in the way that this paper required. The Unity server had to take in messages from the client and modify something about itself and render that change. Like in this case reset the environment or move the agent left and right or update a UI element. I was able to narrow down the bug to whenever my code reached a rendering step, a point of failure in the gRPC message would occur.

Eventually I found that through a combination of async messages with a Unity plug-in, a custom repo to push event handling to the main thread, and the use of Coroutines (it’s like the Unity specific way of doing asynchronous code) I was able to get the Unity server to respond to the gRPC messages. That milestone is the messy branch labelled windows-simple-grpc. If you start the Unity env in the editor, you can connect a simple Python client to send gRPC messages to config the starting settings for the env and move the agent left and right.

The other challenge I had was me not fully understanding the bidirectional stream in gRPC. I think I have a better grasp on it now. I believe the way the bidirectional stream works is that each request from the client has to receive a response from the server. The requests to responses are one-to-one. When I first started I had imagined the stream as a pipe with requests coming in and responses going out in a decoupled manner with both sides listening and reacting to responses and requests, not needing a one-to-one and delays allowed between. When I tried to code it that way, I was unsuccessful. Either way presents its own challenge when trying to extend this architecture to multi-agent environments where multiple client connections are allowed into the server.

The Linux Full Implementation

With the Windows prototype working that left two main pieces: upgrading the communication layer from simple gRPC to the dm_env_rpc and Dockerizing the environment as a container. Luckily the transition from Windows to Linux was smooth for my prototype.

One issue I had was when I tried to compile the dm_env_rpc proto files and add them to the Unity env. The proto file uses an import which was not reqcognized in the gRPC package for Unity. I removed that import and the one type of status message that relies on it and put it on my to do list for future improvements.

Other than that, the integration to using dm_env_rpc was straightforward. The dm_env_rpc repo comes with useful documentation and an implementation of the PyGame catch which made integrating the Python side of dm_env_rpc straightforward. I rewrote AgentSession.cs, Avatar.cs, BlackBoxRLTask.cs, and DmEnvRpcImpl.cs to work with the dm_env_rpc specific protos and request/response structure.

The main struggle for me was trying to unpack and pack the Tensor proto messages correctly. It took me way too long to figure out how to access the C# classes to get data out of the proto requests and to package them into the proto responses. dm_env_rpc has a tensor utilities file that does that in Python. Trying to do the C# version of that was way too time consuming. That was due to me not reading the Python example closely enough and lack of C# knowledge. Eventually I wrote a C# script to help pack and unpack tensors, though it needs to be filled out.

At that point, the server was up and running and responding to Python client. Up next was Dockerizing the Unity env. Building the environment was simple and getting the client to interact with the built client again was no issue. It took a few hours to break down what exactly I was looking for in the Docker container, how to run the Unity env in a headless way, and to nail down the Dockerfile syntax. I think with this part and the packing/unpacking of the Tensors, I wasted too much time thinking I was just a little change of syntax away from fixing the issue. It was only when I stepped away from the keyboard, came up with a plan, and tested my assumptions that I was able to make progress.

Next Steps

There’s a lot of work to be done to turn this from a rough prototype implementation to a usable piece of code. I’m going to take a little time step back and organize my thoughts on the project. If I continue I’m going try to breakdown the key parts and maybe enlist some. If anyone is interested, let me know. See the repo’s read me for the type of things that need to be improved.