What I Learned Training Digital Spiders to Crawl

A while ago, I watched a video of an AI by Google DeepMind that taught itself how to run and walk. As I was watching it, I knew that the AI wasn’t consciously doing anything — it was only optimizing based on a loose set of rules— and this amazed me.
After I discovered Unity ML Agents, I got really excited about seeing for myself how an algorithm could accomplish this, so I ran a few of my own experiments with customized models to see what would happen — and got completely blown away by the results.
In this article, I’ll talk about what I’ve realized about how machines learn and show you all the crazy things my own computer came up with.
How Machines Learn
The process of learning is difficult. As humans, the acquisition of a skill occurs in two steps. First, we acquire knowledge about the skill. We understand what it is, why it’s necessary, and strategize methods to learn it. Only with this information, can we optimize for the next step, which is practicing the skill until we’re consistently successful.
This process of learning is effective but ultimately constrained.
Everything we learn is based on our biology. An infant can’t learn calculus because it requires a degree of attention that they’re incapable of. An athlete training for the Olympics can’t run 24/7 because they would ultimately die of exhaustion.
But if they were able to, they would be exponentially more successful.
Computers don’t have this problem. A machine can run 24/7 and can think trillions of times faster than a human can. OpenAI can compress hundreds of years worth of DOTA games into a few weeks, and AlphaGo can play itself millions of times in that same time frame. However, these AI do all this without even understanding what they’re doing.
Simply, they’re just performing calculations.
If we look closely at how OpenAI and Google DeepMind “learn”, they essentially break down human learning into factors that can be calculated:
- Reward. The notion that getting closer to your goal is good. Humans measure progress during learning, and so do machines. Like when your parents took you out for pizza after a good report card, you want to have a high reward value.
- Action. Moving an arm. These are vectors describing the changes of objects in an environment. Actions will affect state.
- State. Standing, sitting. These are vectors describing the object’s qualities in an environment. State will result in a reward.
- Policy. If there is a car, probably move. These are tendencies that can be learned in a Neural Network. A policy decides the next action based on the current state.
- Q-value. Sustainability is better than spontaneity. A Q-value function takes in the current state and the current action to calculate the discounted long-term value of the action-state pair. Valuable states are ones that will also result in good future states.
- Discount. Value now is better than value in the future. A discount that is applied to the Q-value will make sure an immediate high-reward state is prioritized by decreasing the value of a more distant state. As a result, the AI would rather maximize reward earlier than later.
These factors can be combined in many ways in different algorithms to iteratively “learn” a task. Thus, for the last two decades, the focus has essentially been on making these algorithms as effective as possible.
The algorithm used by Unity ML-Agents is called PPO (Proximal Policy Optimization) and is the state-of-the-art algorithm used by OpenAI at the moment.
What’s important about this algorithm is that does a very good job of helping the machine to learn but doesn’t exactly model human learning. AI doesn’t yet have the ability to extrapolate information about skills in a generalizable and intelligent way, but by modelling a “practice makes perfect” approach with computers, we can overshadow the former need completely.
The AI doesn’t need to know what it’s doing, it just needs to do something that fits the rules.
In this way, machines have long surpassed us at learning.
The Cool part
To experiment, I tried to make crawlers or virtual “spiders” with different numbers of evenly distributed legs crawl. The two objectives were to always be moving towards the right and to never let the middle ball hit the floor.
I started off with a 5 legged one. Here’s what it came up with:
When this crawler finished training, I was really curious. I imagined that the best possible way it would walk would be to use 2 or 3 legs to launch itself forward, then use the others to catch itself, but this was different.
What stood out to me about this method was that almost all the time, two of the “hind legs” would slip forward after being launched by the one in the back, maintaining balance. The remaining two legs would then inch the body forward a little.
During the training process, the AI likely learned that with a largely asymmetrical body, one of the most important things was balance and sought the easiest way to maintain it.
I thought this was awesome, so I did one with 6 legs:
When this one finished training, I remember being really anxious about whether or not the “crawling” was sustainable. 6 legs should have solved the symmetry hypothesis I had about the 5 legged crawler, and I’d imagined it would have inched itself forward like a centipede.
The main innovation here was the 3 legged swing that the crawler does periodically, then crawling the other side back up, and doing the swing again. It technically does complete the two objectives — and makes clever use of physics in doing so.
The AI probably figured really quickly that momentum was the quickest way it could move forward, yet this probably wouldn’t be realistic to actual spiders, given the energy that would be expended by friction which is not modelled here.
I did another one with 7 legs:
This crawler takes the 6 legged solution up a notch. It’s no longer what you would expect at all out of a “crawler”, and it’s basically just spinning forward.
One remarkable feature I noticed was that every rotation, the crawler would have one leg that sticks up and flails around to generate momentum. Thinking about it, the AI could have arrived at this solution through random experimentation after it became stuck in a position with no more velocity. It would have been easier to just generate momentum as opposed to find another way to propel itself forward.
At this point, the “crawling” was getting crazier and crazier, but I tried one more, and I think you’ll get a kick out of the 8 legged one.
This crawler took 3 million steps to train, over 36 hours on my modest laptop, and when I saw what it was doing, I let out an audible “wtf”.
Real spiders have eight legs, and as far as I know, spiders don’t crawl best while doing cartwheels.
I know, I know. Obviously, I haven’t modelled everything about the insect to make it realistic, but this “crawling method” seemingly came from nowhere.
After a little more careful observation, I realized the mechanism behind what the crawler was actually doing was fascinating. After lifting itself up with a powerful hind-leg thrust, it was able to use the resulting momentum to keep rolling forward. The crawler had actually learned to balance itself in a cartwheel position. If it was leaning towards one side, it would adjust for that by moving some legs toward that side. And even when it completely lost balance, it had learned to keep the sphere away from the ground as long as possible by jumping.
This was insane.
After doing a bit of research, I then found this:
This is the Cebrennus rechenbergi, a spider that lives in the flat, Moroccan deserts. And it does cartwheels to escape its predators.
Scary, but awesome at the same time.
The AI basically came up with an evolutionary function that actually exists
Conclusion
So currently Reinforcement Learning is at a point where things can learn to walk, run, and crawl very very well solely through practice.
At the end of the day, machines still lack the ability to understand and strategize like humans, yet they can already complete specialized tasks much better than us.
One thing about the 8 Legged Crawler that I didn’t mention was that the training process was not steady at all. For the first 2 million training steps, it barely knew how to get past a short distance, but sometime right after, the reward grew explosively, and it immediately “discovered” how to sustainably do cartwheels, allowing it to go multiple times farther.
When computers learn to fundamentally understand tasks (whatever that may entail), we could see an even more explosive change in the way these algorithms, and potentially even the world, will work.
Takeaways:
- Machines can be designed to learn better than human beings.
- There may be vast amounts of biological solutions that have not been discovered by evolution but are very feasible nonetheless.
- Reinforcement Learning will almost certainly be a major cause of the Singularity.
Thanks so much for reading! Follow me for more articles like this in the future!
