Early Explorations of Learning Agents in Unreal Engine
With the still staggering revenue of Fortnite to pull from, Epic Games continues to invest into their flagship product for developers; Unreal Engine. Every fortnight they announce some new, exciting feature to add to their rich ecosystem. At the very end of March this year, they introduced Learning Agents, a proper reinforcement learning interface to train bots directly in the engine!
On a personal-professional level, this is a huge deal, as I fall into the intersection of AI researcher and unreal engine user. Prior to this I’ve been cooking on a similar plugin, but I kept hitting walls in the massive codebase that is UE5. Regardless, here are my findings from tinkering with this for the few days it has been available.
Before we begin, just a quick disclaimer: there is virtually no documentation for this code, so this is just the result of trial, error and assumption. Nothing here is considered best practice, but it will get you to a point where one or more agents are learning using PPO.
Update: The project is now available as a public github repository
Installation
This plugin is currently only available by building the ue5-main branch of the engine from source. There are plenty of good tutorials on how to do this, but I’ll give a quick outline of it here as well. Keep in mind that a source-build engine is way bigger than the one distributed through the launcher, and this one sits at a whopping 219GB(!)
- Get access to the unreal engine source code repo
- git clone -b ue5-main https://github.com/EpicGames/UnrealEngine.git
- Run Setup.bat
- Run GenerateProjectFiles.bat
- Open the solution file (UE5.sln) in Visual Studio 2022
- Set the configuration to Development Editor, and the platform to Win64
- Build the engine
This takes a while, but when you’re done, you should be able to launch the engine, and create a new project. When you launch the engine, it shows the version number 5.3, this does not necessarily mean that the plugin will be stable with that update.
Project Setup
I chose to use the third-person example project with started content, as it comes with ready characters that we can use as avatars for our agents. If you have done everything right until this point, the Learning Agents plugin should be available when you go to Edit>Plugins in the editor. Enable it to get access to all the relevant classes.
When working like this, I find it convenient to walk around as a character myself, and watch how the agents fare. BP_ThirdPersonCharacter, found under Content>ThirdPerson>Blueprints is perfect for this. To get our actual learning agents, I use a neat little trick where I duplicate this blueprint, and change its parent class to Character. This gives us good-looking characters with a movement component and animations out of the box. This new blueprint will be referred to as AgentCharacter.
The Classes of Learning Agents
For this exploratory project, we stick to blueprints. From the plugin, the following classes have been exposed:
LearningAgentsType
LearningAgentsController
LearningAgentsImitationTrainer
LearningAgentsRecorder
LearningAgentsTrainer
LearningAgentsDataStorage
To get an inital training session to work, I used the two highlighted in bold.
LearningAgentsType
All the classes, except for DataStorage, inherit from the SceneComponent class. Because of this, I created a generic actor blueprint called LearningManager, and attached a blueprint subclass of LearningAgentsType to its DefaultSceneRoot. This componet appear to be responsible for observation and action handling, as it provides the following four events: Setup Observations, Set Observations, Setup Actions, and Get Actions.
First we’ll defined the observation and action spaces of this environment. For this example, we keep it very simple, where the actions represent movement across the global X and Y axis, while the observation is the position of our agent. This is by no means a solid foundation for intelligence to emerge, but it’s very simple to setup in order to test the system.
As the figures above show, we setup the spaces by adding features one element at the time. Keep in mind that these events are run only once, prior to any learning. This is to define any memory structures that will be sent back and forth between python and unreal.
Every data type made available as a feature for the observation or action space are represented by classes such as FloatAction in the plugin. These classes are responsible for translating or parsing values between unreal and pytorch (the framework that our neural networks are built in).
We store references to these observation and action objects as variables as they will be used soon to actually set the observation variables, or to execute actions within our environment. So far we have covered two out of four events in this class. The other two will be addressed shortly, but first we’ll have to connect this all to the game world.
The Learning Manager
The LearningManager is an actor by me that is placed into the game world, and it is responsible for initialization and connecting everything together. The LearningAgentsType (from here on referred to as LAType) is attached as a component to this actor, and the LearningAgentsTrainer (LATrainer), is attached to the LAType.
To simplify the example, I have simply placed a few AgentCharacters around the game world. In both the LAType, LATrainer and the LearningManager, the variable AgentClass points to this class; AgentCharacter. These will be the avatars that acts and perceives in our world.
As we launch the game, every AgentCharacter is added to the LAType as an agent. Practically speaking, this means that our learning system will be able to refer to these objects for reading observations and writing actions. After this, we run the setup function for the agents, which in turn calls both the events that we implemented earlier, namely Setup Observations and Setup Actions. The TypeSettings and NetworkSettings structs are used with default values for now.
NOTE: The maximum number of agents that can be added is defined under Class Defaults in the LAType class. The default value is 1, for my experiments I set it to 8. This number can be greater than the number of agents that are added before training.
LearningAgentsTrainer
The final class we cover, is the trainer class, LATrainer. It is responsible for handling the variables that are related to training, where the LAType covers the variables that are related to agent behavior. More specifically, this means that it handles rewards and completion. With this, we complete the standard quadruple from reinforcement learning: state, action, reward and done.
The setup section of this class is more or less structurally identical to how observations and actions were set up in LAType. I have yet to dive into how weights and completion modes are used. According to a comment in the source code, the setup events for LATrainer and LAType provide a reference to self, simply to make your code a bit more pretty.
Tying Everything Together
At this point, we have the individual elements setup, but our agents will not move, and there is no training loop. To resolve all of this, we begin by initializing the trainer. Note that the execution flow in this figure picks up where the agent setup left off, in the BeginPlay implementation of the LearningManager.
Once again, we use setting structs with default values. Note that the Agent Type Override isn’t needed, as the trainer will by default use its parent component as the agent type. At this point, if we run the project, training will start, but there are two major implementations lacking for it to actually work: connecting observations, actions, rewards and completions to our game, and the training loop itself.
First we begin by setting the observation for each agent. As you will see, many of the events in the plugin provide a list of agent IDs. This is because the system is designed for training multiple agents at once. Typically, these implementations should process the agents one by one, as demonstrated in these examples. To set the value for out observation feature (position), we use the object reference stored from when we defined the space. This object provides a setter function, which also takes in an agent ID, further demonstrating how the system keeps track of multiple agents.
Since actions are determined by the actor network in the learning algorithm, we don’t set them from the game, but rather read out their value, and apply them to the agent. In this example, we add basic movement based on the two actions. Note that the float actions can give negative values as well, enabling movement in both directions on each axis.
Rewards and completions are handled in the exact same way as observations. The figure above is a crude example, where the reward on each time step is random, and where the episodes end randomly. In my own experiments I define reward as distance to a certain location in the map, and the completions are time-based. Those implementations are not presented here, mainly because they are very straight-forward to implement, but take up way more physical space in the blueprint, making them impractical for this demonstration. The way they interface with the learning system is identical, however.
The Training Loop
This segment is by far the one that I am least confident in, as it feels a bit dirty. The first sin is my decision to put it on tick in the LearningManager. This does keep the loop firmly within the main game thread, but with all the moving parts involved, I will be sleeping with one eye open. The main concern, however, is that I call a sequence of important functions in a specific order. When researching this, I assumed Iterate Training would call everything for me. Despite all of this, it all seems to work well at this point.
The loop will be demonstrated in three parts, all part of the Tick event in LearningManager. The first part is simply a check that training has started. As the system will spend some time for setup, this is not necessarily true in the first few ticks, hence the check before we continue.
The second part handles all calls to LAType, where the policy (actor network) is evaluated, to tell which actions the agents will take next. The respecive encoding and deconding calls for observations and actions calls upon the events we defined earlier to interface with the game itself.
The third and final part of our loop is to update the variables for rewards and completions. Then to commit everything and sync with the python side of things, we call Iterate Training.
Running Training
Unless you (or I) have forgotten anything by now, pressing play in the editor should start training. The setup will take a while, but when it’s ready, you’ll see regular printing in the output log. The python scripts where the training happens can be found at the following location:
UnrealEngine\Engine\Plugins\Experimental\LearningAgents\Content\Python
Logging
Watching the agents run around is satisfying, but if want to track properly track progress, we need logging. There is a boolean in the training config called UseTensorBoard, which works. However, UE5 ships with its own python environment, which can be found at the following location
UnrealEngine\Engine\Binaries\ThirdParty\Python3\Win64
To install new modules into this environment (which is the one used by the plugin), launch a terminal in this directory and run:
python3 -m pip install tensorboard wandb
This should be enough to make the optional tensorboard work. However, note that this will generate tensorboard output in the LearningAgents plugin directory, so if you want to change this, specify a new path in the train_ppo.py script where tensorboard is initialized. I like to use weights and biases (wandb) for my projects, so I installed that as well, and setting it up was very straight-forward.
Conclusion
That concludes my initial exploration into this exciting new plugin. So far things have been surprisingly stable and intuitive (unless I have completely misunderstood the intended use of everything). I hope this is useful for those who seek to do RL in unreal.
My next steps will be to explore the other classes, first of all the LearningAgentsController, which could very well be a missing link in this whole scheme. Furthermore I want to construct a proper training environment with a basic multi-agent shooter game. Once that is established, there are multiple directions to go, such as deploying trained agents, optimizing training by packaging the project, or try out imitation learning, which seems to be a heavy focus of this plugin.
Do not hesitate to reach out to me for questions, feedback, or if you just want to talk unreal engine and AI! My email is jonathan@riddlebit.net