Generating image segmentation datasets with Unreal Engine 4

Deep learning is great. Image segmentation using neural networks is awesome. Creating datasets to train those networks is awful. It’s not just selecting which class an image belongs to or trace an easy rectangle around an object, you need to go pixel by pixel to create the best dataset. This work is fastidious to do manually. The easy solution is to use existing datasets, which are not really diversified and kind of rare if you want to work on something that’s not road scenes.

An example from a real-like segmentation dataset available here :

You don’t absolutely need real world images labelled by hand to work with image segmentation. Simulated data works great and you can then push your real-world images to an already trained network instead of working from scratch. 3D game engines are more accessible than ever and are an easy starting point to build virtual worlds. Any computer training neural networks on a GPU has more than enough horsepower to do this.

Warning: This is a rough work in progress and won’t answer all your questions about UE4 but will give you an idea of what can be done. I expect you to google a lot following this guide and become somewhat at ease with UE4, I’ve played around for two weeks before feeling that I somewhat understood something about Unreal.

Accurate representation of the first time I opened the Unreal Editor.

Install and launch UE4

We’ll use Unreal Engine 4 ( for one main reason: it’s the raspberry pi of 3D engines. The community around it is huge and it’s possible to google almost any question and find something. There is also Unity ( and Lumberyard/CryEngine ( as popular alternatives. Those ideas could work in any of these engines, feel free to use the one that works best for you and your existing coding abilities. I’ve tried Lumberyard at first but being a complete newbie with game engines made it too hard for me.

Once installed, create a new blueprint third person project using the templates. This will give you a working map, character, camera, etc. starting point.

Procedural generation using tiles

We’ll use semi-random generation using tiles (like Diablo games), it’s easy and fast. Fully random generation is a lot harder if you want to create images that have real-word similarities. It would however be possible, something like Minecraft is a notable example of great random generation that create lifelike worlds (if we forget the everything-is-blocks part).

This section is based on this excellent tutorial by Pub Games Watch it right now, what follows will make sense.

My blueprint based on this concept is available here:

Now that you understand the basics, a big part of your success will depend on the assets that you find or create to build the maps. This part will need a lot of time to build something that looks like the scenes you’re trying to train for. Fortunately, there is a lot of existing free and commercial assets for Unreal that can help you. For this test I’ve used 3 sources:

Also, there is no need for all elements to be restricted to the square, going over the edges will create more natural looking scenes and avoid the grid-like look that this method can do if you limit the assets inside the zone.

At the end of this step, you will have a new map each time you load and the capacity to navigate around using the existing third person character.

You’re now a God, learn to accept it.

Material switching

Materials in 3D engines are the high-tech paint applied to polygon meshes that makes up the actors in the scene. Actors being any elements on the map, not just characters. Switching these materials will help us generate segmentation mask. The goal is to replace every material on the maps by some flat color, all elements of the same class having the same color.

  • Build flat materials alternatives (one color per class)
  • Use Unlit shading for the material (and don’t forget to switch viewmode to unlit before taking the labels screenshot)
  • Assign an uniform color to illumination
  • Materials with transparent parts will be somewhat harder to make. You must keep transparency to be pixel-perfect between real view and label view. I found that the simplest way was usually to duplicate the existing material and modify it by switching to unlit shading and forcing an uniform color illumination over the default texture.
  • Set actor tag names for all actors on each of your tiles
  • Create a data structure / data table that list all actor tags, material index, default material and flat material
  • Create events that select all actors by group name and switch their material

Important! Disable everything that moves (wind, motion blur, temporal anti-aliasing, etc.), your scenes must be completely static with this method as the screenshots are taken one after the other in the next section, which will not give adequate result if elements have moved between the twos.

This guide will save you a lot of time to create the data table:

My blueprint for the switcher itself is available here:

When the event is called it loops the whole data table and assign the new material to all actors, giving the E key the following effect:

View using the default materials
The same view switched to flat materials

Automatic navigation and screenshots

Now the last Unreal step is to generate screenshots without human intervention. My current system is simple:

  • Your character spawns at the center of the map
  • The character moves in a random outward spiral
  • Each X number of ticks, we take a screenshot of the scene with default material and then flat material
  • Each Y number of screenshots, we rebuild the map and start over

My blueprint is available here:

The screenshot sequence is the tricky part, I’ve used delays to make sure all changes are done before taking screenshots. This could be optimized for sure. Teleporting at random locations instead of moving around would also save time.

Your screenshot folder will now be full of images and their corresponding label maps.

Now that’s the good stuff.

Prepare images and maps for usage

I use a simple Python script that takes as input a folder full of screenshots and do the folder/name/class final work. You can find it here:

This step will change a lot based on your needs for the final structure of the dataset, feel free to adapt.

Train your network

I am currently testing a Keras implementation of Segnet built by ykamikawa available here:

I’ve adapted it to work on this dataset and trained on the 8 classes that it contains. It’s working great so far with 90%+ accuracy.

First row is the test image, each following row is the prediction for each of the 8 categories

What’s next

Lot of work have to be done on this technique to make the images as realistic as possible for real use cases. Still, having the possibility to produce a near infinite amount of training data is such a time saver when working on image segmentation.

Some ideas that I will work on based on this experience:

  • Finding a way to take both screenshots at the same time to allow movements like wind or moving vehicules
  • Teleporting instead of moving around
  • Switching from tiles to Minecraft-like generation
  • Having variation on the actors (textures, colors, etc.) to be more life-like
  • Optimizing the material switcher to get rid of the select all actors with tags each time
  • Work on data augmentation capabilities