A hands-on introduction to deep reinforcement learning using Unity ML-Agents

Get started with deep reinforcement learning by training agents to play Volleyball.

Joy Zhang

Published in

Coder One

3 min readAug 26, 2021

Purpose

If you’re new to reinforcement learning (RL), there’s some great introductory courses out there. Just to name a few:

But if you’re anything like me, you might prefer a ‘learning by doing’ approach. With hands-on experience upfront, it may be easier for you to grasp the theory and math behind the algorithms later.

In this series, I’ll walk you through how to use Unity ML-Agents to build a volleyball environment and train agents to play in it using deep RL. For a bit of fun and extra incentive, you’ll be able to submit your trained agent to the Ultimate Volleyball leaderboard and have it compete against other agents.

Volleyball agents trained using deep reinforcement learning

Why ML-Agents?

ML-Agents is an add-on for Unity (a game development platform).

It lets us design a complex physics-rich environment without needing to build any of the physics simulation logic ourselves. It also lets us experiment with state-of-the-art RL algorithms without having to set up any boilerplate code or install additional libraries. The nice graphics and interface are a plus.

A (very brief) overview of reinforcement learning

Lets use volleyball as an example. Our players (agents) initially know nothing about how to play volleyball. They’ll start out taking actions completely at random. Through trial-and-error, they’ll realise:

When they hit the ball and it goes over the net, they sometimes score points (positive feedback) ✔️
When they let the ball hit the floor, they lose a point (negative feedback) ❌

By continuing to do things that lead to positive outcomes, the agents will eventually learn to hit the ball over the net whenever it’s on their side of the court. Reinforcement learning is a subdomain of machine learning which involves training an ‘agent’ (the volleyball player) to learn the correct sequences of actions to take (hitting the ball over the net) on a given state of its environment (the volleyball game) in order to maximize its reward (scoring points).

This can be illustrated more formally as: