Simulating and training neural creatures to reach a target

Scott Sauers
4 min readApr 10, 2024

--

Let’s make some bugs and give them neural networks as brains.

We’ll give them a reward when they are closer to the finish line.

We can see some bugs move noisily or sub-optimally, a behavior which wouldn’t be observed with a simple “take step in direction of target” algorithm.

Let’s make the bugs a bit smarter and then make the challenge harder by having the finish line move in a circle:

They do a pretty good job! Each bug’s movement decision is determined by a neural network, which inputs the bug’s current position and the target’s position, and outputs a step direction and size. Notice how they quickly learn to not just naively move towards the target. Simply stepping towards the target is sub-optimal because the target is faster than they are, so they will never reach it.

However, if we decrease the model depth (remove layers from the neural network), the bugs can’t figure out how to be near the finish line. (Let’s also add a fitness function to track progress over time.)

Each bug is a simple feedforward neural network with one hidden layer.

The fitness graph (lower is better) shows a bit of a cyclical pattern:

When fitness increases, so does variance in fitness. The bugs don’t interact with each other. Big risk (moving out of the safe center), big reward (being close to the target when it passes by).

What will happen if make the model very small (so it cannot learn properly), but also make the target impossibly fast?

We see the model fails to learn a great strategy of what to do, though even this basic approach seems to improve fitness slightly over time.

We can also try crazier fitness functions:

As the training continues, the bugs learn how to arrange themselves in advance to get hit by the target. They calculate where it will be next in their path. Not all of them can succeed all the time, so variance increases steeply when this happens:

This is an emergent behavior that occurs only with large enough networks.

Now, we will:

  • Normalize the variance so it doesn’t explode relative to the fitness
  • Define a fitness function which increases with increasing function to be more intuitive
  • Make the bugs smarter (increase the neurons per layer and number of layers in each bug)
  • Train for longer
  • Use 2D convolution

Here are the results:

As you can see, the fitness increases quickly (but slower than when a 2D convolution is not used, but model size is the same) at first then levels off. The 2D convolution networks seem to make fewer risky movements.

Let’s go further and train for even longer with even smarter bugs, dropping the 2D convolution since it slowed training:

Wow! The bugs develop a more complex strategy for staying close to their goal. The finish line moves faster than they can move on average, so it’s an impossible task to “win” all the time. However, they form circles to ensure they are close when the finish line passes near them. They are still improving even near step 5000.

This might be getting too easy for these bugs. They’re able to develop a strategy easily because they can move adjacent to the circle for some time and gain fitness. If we make the finish line target move much faster, it should be harder for them to develop a strategy.

This is very interesting. At first, it looks hopeless. But after 4000 training steps (100,000 steps total), the 25 bugs exhibit an emergent strategy. They realize by moving away from the center, they can be nearer to the target when it passes by. This is stable for some time, though there is variance among bugs.

This is a difficult strategy to learn because it requires sacrificing fitness temporarily by moving out of the middle even when the target is away.

Here is the same video at normal speed:

Each bug has three fully connected (linear) layers with ReLU activation functions and dropout applied after the first two activation functions. Each step updates 4950 neurons, so there are 49.5 million updates total in this video.

We can conclude:

  • Model set up can influence strategy adoption, not just training efficiency
  • Emergent behavior occurs at large network sizes
  • Neural networks without memory can still “plan” for the future

--

--