Build a reinforcement learning environment using Unity ML-Agents

An introductory tutorial on how to build a physics-based 3D volleyball environment for reinforcement learning.

Joy Zhang
Coder One
7 min readSep 2, 2021

--

This article is part 2 of the series ‘A hands-on introduction to deep reinforcement learning using Unity ML-Agents’. It’s also suitable for anyone new to Unity interested in using ML-Agents for their own reinforcement learning project.

Sections

Part 1: Getting started with Unity ML-Agents
Part 2: Building a volleyball reinforcement learning environment (this post)
Part 3: Design reinforcement learning agents using Unity ML-Agents
Part 4: Training an agent using PPO
Part 5:
Training agents using Self-Play

Recap and overview

In my previous post, I went over how to set up ML-Agents and train an agent.

In this article, I’ll walk through how to build a 3D physics-based volleyball environment in Unity. We’ll use this environment later to train agents that can successfully play volleyball using deep reinforcement learning.

Setting up the court

  1. Download or clone the starter project from this repo.
  2. Open Unity Hub and go to Projects > Add.
  3. Select the ‘ultimate-volleyball-starter’ project folder. You might see some warning messages in the Console but they are safe to ignore for now.
  4. From the Project tab in Unity, navigate to Assets > Scenes.
  5. Load the Volleyball.unity scene.
  6. In the Project tab go to Assets > Prefabs and drag the VolleyballArea.prefab object into the scene.
  7. Save the project.

If you click Play ▶️ above the Scene viewer you’ll notice some weird things happening because we haven’t added any physics or logic to define how the game objects should interact yet. We’ll do that in the next section.

Setting up the environment

⚠ Before we start, open the VolleyballArea prefab (Project panel > Assets > Prefabs). We’ll make our edits to the base prefab, so that they are reflected in all instances of this prefab. This will come in handy later when we duplicate our environment multiple times for parallel training.

Volleyball

Make our volleyball subject to Unity’s physics engine:

  1. In the Hierarchy panel, expand the VolleyballArea object and select the Volleyball.
  2. From the Inspector panel, set the tag to ball.
  3. Click Add Component > RigidBody.
  4. Set mass = 3, drag = 1 and angular drag = 1. Feel free to play around with default values. A heavier ball will make the environment ‘harder’.

Add ‘bounciness’ to our ball:

  1. Add a Sphere Collider component.
  2. Set Radius to 0.15.
  3. From the Project panel, go to Assets > Materials > Physic Materials.
  4. Drag Bouncy.physicMaterial into the 'Material' slot.
  5. You can double-click Bouncy.physicMaterial to change the 'bounciness'.

Both blue and purple agent cubes have already been set up for you in a similar way to the Volleyball.

Ground

  1. Select the Ground game object
  2. From the Inspector panel, set the tag to walkableSurface. This is used later to check whether or not the agent is 'grounded' for its jump action.
  3. Add a Box Collider component. This is used to register collisions with other game objects containing Rigid Body components. Without it, they will just fall through the ground.

Goals

Goals are represented by a thin layer on top of the ground.

  1. Expand the BluePlayArea and PurplePlayArea parent objects.
  2. Add a Box Collider to both the BlueGoal and PurpleGoal game objects.
  3. Check the ‘Is Trigger’ box for both goals.

When a game object is set as a trigger, it no longer registers any physics-based collisions. Even though the goals are placed above the ground layer, technically the agents are moving on the Ground layer collider we created earlier.

Setting triggers allows us to use the OnTriggerEnter method later which will detect when a ball has hit the collider.

Net

  1. Select the Net game object within VolleyballNet.
  2. Add a Box Collider.
  3. Click the ‘Edit Collider’ icon.
  4. Click and drag the bottom node of the green collider so that it covers the entire height of the net. Feel free to play around with the thickness. The intention here is to create a physical ‘blocker’ that will prevent the ball from going under or around the net.

💡 Some shortcuts: Alt+click to rotate, middle-click to pan, middle mouse wheel to zoom in/out.

Boundaries

There are three invisible boundaries:

  • OuterBoundaries (checks for ball going out of bounds)
  • BlueBoundary (checks for ball going into the blue side of court)
  • PurpleBoundary (checks for ball going into the purple side of court)

Colliders, tags, and triggers for these boundaries have already been set up for you.

Scripting the environment

In this section, we’ll add scripts that define the environment behavior (e.g. what happens when the ball hits the floor or when the episode starts).

VolleyballSettings.cs

Our first script will simply hold some constants that we’ll reuse throughout the project.

  1. Go back to the Volleyball Scene and select the VolleyballSettings game object.
  2. In the Inspector, you’ll see a Script component attached. Double click the VolleyballSettings script to open it in your IDE of choice.
  3. You should see the following:

Note: there is also a ProjectSettingsOverride.cs script provided. This contains additional default settings related to time-stepping and resolving physics.

Go back to the Unity editor and select the VolleyballSettings game object. You should see that these variables are available in the Inspector panel.

VolleyballController.cs

This script is attached to the Volleyball game object and lets us detect when the ball has hit our boundary or goal trigger.

  1. Open the VolleyballController.cs script attached to the Volleyball.
  2. At the start of our VolleyballController : MonoBehaviour class (above the Start() method), declare the variables:

3. Save the script.

4. In the Unity editor, click the Volleyball game object.

5. Drag the PurpleGoal game object into the Purple Goal slot in the Inspector.

6. Drag the BlueGoal game object into the Blue Goal slot in the Inspector.

This will allow us to access their child objects later.

Start()

This method is called when the environment is first rendered. It will:

  1. Fetch the PurpleGoal & BlueGoal Colliders themselves (the components that register physics-based collisions) using the GetComponent<Collider> method:

2. Assign the parent VolleyballArea game object to a variable ‘envController’ for easier reference later.

Copy these statements into the Start() method:

OnTriggerEnter(Collider other)

This method is called when the ball hits a collider.

Some scenarios to detect are:

  1. Ball hits the floor/goals
  2. Ball goes out of bounds
  3. Ball is hit over the net (to encourage volleying for training later)

This method will detect each scenario and pass this information to envController (which we'll add in the next section). Copy the following block into this method:

VolleyballEnvController.cs

This script holds all the main logic for the environment: the max steps it should run for, how the ball and agents should spawn, when the episode should end, how rewards should be assigned, etc.

In the sample skeleton script, some variables and helper methods are already provided:

  • Start() — fetch the components and objects we'll need for later
  • UpdateLastHitter() — keeps track of which agent was last in control of the ball
  • GoalScoredSwapGroundMaterial() — changes the color of the ground (helps us visualise which agent scored)

FixedUpdate()

This is called by the Unity engine each time there is a frame update (which is set to every FixedDeltaTime=0.02 seconds in ProjectSettingsOverride.cs).

This will control the max number of updates (i.e. ‘steps’) the environment takes before we interrupt the episode (e.g. if the ball gets stuck somewhere).

Add the following to void FixedUpdate():

ResetScene()

This controls the starting spawn behavior.

Our goal is to learn a model that allows our agent to return the ball from its side of the court no matter where the ball is sent. To help with training, we’ll randomise the starting conditions of the agents and ball within some reasonable boundaries:

ResolveEvent()

This method will resolve the scenarios we defined earlier in VolleyballController.cs.

We can use this method to assign rewards in different ways to encourage different types of behavior. In general, it’s good practise to keep rewards within [-1,1].

To keep it simple, our goal for now is to train agents that can bounce the ball back and forth and keep the ball in play. We’ll assign a reward of +1 each time an agent hits the ball over the net using the AddReward(1f) method in the corresponding scenario:

We won’t assign any rewards for now if a goal is scored or the ball is hit out of bounds. If either of these scenarios happen, we’ll just end the episode. Add the following code block to the sections indicated by the // end episode comment.

Here’s what ResolveEvent should look like:

Now when you click Play ▶️ you should see the environment working correctly: the ball is affected by gravity, the agents can stand on the ground, and the episode resets when the ball hits the floor.

Wrap-up

You should now have a volleyball environment ready for our agents to train in. It will assign our agents rewards to encourage a certain type of behavior (volleying the ball back and forth).

In the next part, we’ll design our agents and give it actions to choose from and a way to observe its environment.

If you have any feedback or questions, please let me know!

--

--

Joy Zhang
Coder One

Product Manager in AI & Medtech. Previously founded gocoder.one. Building in public @ https://beacons.ai/joyfullystudio