Evolving Intelligence in Virtual Animals

Published in

Artificialis

13 min readJan 20, 2022

Photo by Luis Enrique Ibarra on Unsplash

A speculative approach to evolving a weak general intelligence, written from the perspective of someone interested in evolutionary biology, machine learning, and animals in general.

Intro

The goal of this project is to create an AI that’s capable of solving a wide range of simple tasks without large amounts of dedicated time spent retraining for each task. The AI should also be able to learn these tasks completely unsupervised. A good analogy is that I want to create essentially a virtual lizard — something capable of understanding its environment, performing simple tasks in that environment (like catching food), and recognizing and responding to others appropriately (fighting, threatening, or mating). This isn’t the most formal goal, but that’s okay for reasons I will explain later in this post.

Background

Real life examples

As I hinted in the intro, we already have examples of this kind of intelligence in the real world. In fact, we have examples of much more sophisticated, but similar intelligences as well: animals. Humans are an obvious example. Elephants, dolphins, and wolves are some other classic examples of intelligent creatures.

However, there’s an issue with following these real life examples: it took millions of years for them to evolve. I don’t know about you, but I don’t have that kind of time on my hands.

Fortunately, we can improve on the efficiency of such a system. Real life evolution has no goal aside from propagating life; something baked into its definition. We do. We can leverage this difference by setting up specific environments that encourage selection for desired traits. First off, we can skip all the boring pre-cambrian stages of life by giving our creatures fully functional bodies with eyes and an ability to move right off the bat. Next, rather than waiting millions of years for the right conditions to appear, we can set up the environments that led intelligence to evolve.

That brings up an important question: what are “the right conditions” for intelligent life to evolve?

Looking at real creatures, there seem to be a few different scenarios that produced creatures that fit what we’re looking for. The first example being pack hunters; intelligence and sociality seem to have evolved a few times in carnivores that must cooperate to hunt their food. Wolves are a good example of this, as several individuals must cooperate for a successful hunt. For this to be possible, they must be intelligent enough to not only communicate, but to understand what they need to say. In other words, to coordinate they need to have an understanding of their environment and each other.

However, it’s not only carnivores that evolved intelligence. Elephants are another great example of intelligent life. Unlike wolves, they clearly didn’t evolve as pack hunters. Instead, their intelligence may have evolved due to both a patchy distribution of food/water resources, with large distances between, and a difference in availability across seasons.

Now that we have our examples to follow, how will we know when we have a similar kind of intelligence? Something else we see with our best real life examples of general intelligences is sociality — the smartest creatures generally cooperate with other creatures of their species, and sometimes even members of other intelligent species. Both examples I listed above are very social.

Why hasn’t this goal been accomplished before?

So, why don’t we have at least a weak general AI already? Surely with so many examples being produced by directionless evolution, we’d be able to intentionally create at least one model of AGI?

Part of the issue I see with why we don’t have an AI like this already is that we’re approaching the problem from the wrong angle. AI research through machine learning is essentially trying to create mathematically perfect AI with intentional, manually defined structure. Currently, our only examples of general intelligences - living creatures - are agents that evolved without a concrete structure predefined and in a messy environment containing other similar agents to compete and cooperate with.

That last point I believe is also worth paying special attention to — every real life intelligence that has evolved, has evolved to be social. Every cutting edge AI we’ve created has been created as an individual, alone without any peers to interact with. To me, that’s a glaring difference.

What tools do we have?

Now, I’ve been talking about some very intelligent animals for a while. I want to bring us back down to earth, because I don’t think this approach with the tools we have now will create something at the level of elephants, dolphins, or wolves. However, modeling a simulation to attempt such a high goal will hopefully allow us to reach a more modest goal — something comparable to a lizard.

I’ve also been talking about evolution a lot. In the context of AI, this brings to mind evolutionary algorithms. These algorithms work much like real evolution, with two key differences. Firstly, time is chunked into generations. Agents are all evaluated for a set period of time, then all reproduce, die, and their children repeat the cycle. Secondly, which creatures will reproduce and which will become “mating” pairs are chosen using an algorithm external to the creatures themselves.

These evolutionary algorithms have strong advantages, including high efficiency compared to what I will describe next. They are also the only leading machine learning algorithm that I know of to maintain a population of distinct agents rather than training a single agent either alone or with copies of itself.

However, I believe ultimately they will be limiting to the agents’ intelligence. It removes the necessity of mate selection (an important driver of sociality) and forces all agents to always be at the same age (again, neutering sociality). For agents that should be learning at an individual level over time, this makes the evolution of teaching unlikely to the point of impossible. Further, this puts a hard limit on agents’ age — the length of the evaluation. Agents will have a limited time to learn and use their experience. All of this puts an unnecessary cap on how smart agents can evolve to be.

Another approach is simulating an environment in real time. KaryKH on YouTube has a good example of what this may look like with his Evolvio series. Before I continue, I mean no disrespect to this project; in fact it’s one of the things that inspired me and got me interested in this field. In fact, Evolvio had a similar goal to this project. However, I believe it failed to evolve intelligent agents for a number of reasons, including overly abundant food in the environment and — in part because agents lacked ability to learn on an individual level — a weak brain model. Using my terms I define below, Evolvio had weak environment and a weak framework. However, I’m using it as an example because it shows real-time evolution very well. Agents have variable lifespans and new agents are born at variable times. Agents decide when they want to mate and reproduce. Children will always appear near their parents, allowing for family structure to evolve (a good jumping off point for social structures like packs).

This method has clear advantages, however, it also has a clear weakness — glaring inefficiency. Traits are selected for only indirectly, and evolution is slow in general. We’re back to waiting millions of years for the most basic traits to evolve.

So, I believe a blend of the two will be a very powerful tool: initially run the agents through a traditional, generational evolutionary algorithm, then, once a sufficient level of intelligence is reached, place the final generation of agents in a real-time evolutionary simulation and allow them to continue evolving.

This speeds up the start, removing most of the inefficiency of real-time simulations, and also removes the intelligence ceiling of the traditional evolutionary algorithm.

So with all that background done, now it’s time for some formalization.

Formalization

Let’s start with a baseline: what do we need for this simulation?

A defined list of qualities to select for
A framework capable of fulfilling all those qualities if in the right configuration. Preferably with minimal computation.
An environment that will correctly select for those qualities in individual instances of that framework
An algorithm to select for agents against that environment that will not limit their development
(note: this algorithm may interfere with the environment, which is what may cause it to limit the agents)

We can improve this list a bit. Because we have real life examples of general intelligences and we know more or less how they arose, we can skip #1 and go straight to #3. In other words, if we assume similar environments will lead to similar outcomes, and we have an example of the outcome we want and the environment that produced it, then we don’t need to describe in perfect detail the exact outcome we want.

That eliminates a difficult item from the list, but it turns out we also need to add one in. Because evolution is a slow, iterative process where each step towards evolving a trait should be beneficial (see: lack of sharks with laser eyes and other non-existent, awesome traits that are dead weight until they finish evolving) we must be mindful to constantly encourage development towards the traits we want. This is a concept I call easing, and can be implemented by either having a boundless fitness function that scores “development towards the trait we want” or by slowly making the environment itself more challenging to live in over time.

We should also refine our #3. It could be the case that we have the perfect environment and the perfect framework, but the agents lack the ability to fully interact with the world. Perhaps they lack the ability to communicate with each other, or they need some way of interacting with objects. In this case, intelligence will also fail to fully evolve. This requirement is essentially “agents should have the means to fully interact with the world”, but I believe it can be incorporated into the environment requirement more elegantly.

As such, our list of requirements becomes:

A framework capable of fulfilling all the qualities we want, if in the right configuration. Preferably with minimal computation.
An environment that will both allow these qualities to be fully expressed and that will correctly select for those qualities in individual instances of that framework (agents).
An algorithm to select for agents against that environment without limiting their development.
Enough easing built into the algorithm, the environment, or both.

My plan for a proof of concept

Framework:

For the agents’ brains, I will use a recurrent neural network with additional input and output to navigate, construct, and label nodes on a hidden graph unique to the individual. Being a recurrent network, agents will have a form of short-term memory, whereas the graph will act as a form of long term memory, enabling agents to learn and recall past experiences.

Environment:

The creatures’ environment will be a simple 2D world with “landmark” objects that act sort of like bushes. They have no collision, but block sight. The world will also have a sound system. Any object can create a sound wave, which will be tracked by the environment as it grows at a linear rate. The loudness of a given sound wave will be determined by its current radius — as radius increases, loudness exponentially decreases. To “hear” at a given point on a given physics frame, all sound waves that have crossed that point on that frame are considered. A sum of what the wave’s loudness would be at that point (not their current loudness) is taken across all such waves, and a final loudness value is calculated. I am considering adding “frequency” as a value to each wave to separate sound waves into separate channels, to make the agents’ evolution of interpreting sound easier.

As for the agent’s bodies they’ll use to interact with the environment, each creature will be a circle with two “eyes”, each composed of 5 raycasts evenly spread over a given FOV. They will also have two “ears” on opposite sides of the circle, and the ability to create sound in their environment. For actions, the creatures will be able to move, turn, signal that they’re ready for reproduction, and create sound at any loudness between a min and max bound.

If two creatures signal that they’re ready for reproduction while they’re within 3 creature radii of each other, they will both get a one time fitness boost. Once a creature has received this boost, they will never receive it again. Hopefully this will encourage them to evolve the instinct to reproduce without causing them to ignore food.

Speaking of which, the environment will also contain food objects. These are smaller circles that increase an agent’s fitness when collided with. Food objects emit soundwaves at a set frequency and loudness. How these food objects spawn depends on which “challenge” is chosen for a particular run.

Challenge A — the “wolf” challenge

With this challenge, boids are added to the world. These boids are smaller than the agents, but much faster. They also avoid the agents as if they were obstacles. If an agent is able to get close enough to a boid, it turns into 3 food objects. With this challenge, I hope to see pack hunting evolve.

Challenge B — the “elephant” challenge

With this challenge, the worldspace is truly huge. Several locations will be chosen at random to become feeding locations. These locations will shift slightly every generation, so agents can’t evolve to encode the locations in their genomes. At these locations, food will respawn at a very long frequency chosen at random. Food will also despawn if not eaten within a short time of spawning. With this challenge, I hope to see generational memories “evolve” alongside the creatures’ brains as older generations teach newer generations about food locations.

Bonus Challenge — the curiosity challenge

This one is not based on a specific animal, but rather on encouraging a trait seen in many intelligent animals. This seems to be a trait specifically selected for rather than a natural result of environments that otherwise select for intelligence, so I wanted to include a specific challenge to see if that was the case.

Additionally, I believe this challenge may select for teaching and potentially even lying as well. Curiosity is about discovering information, and teaching and lying are about sharing or hiding information from others.

In this challenge, food objects will have various random properties: sound frequency and loudness, color, size, maximum number of respawns (1–3), time between respawns, despawn timer, and edibility (beneficial, neutral, harmful). A food object’s despawn timer starts counting down as soon as it spawns; if the timer reaches zero, the food object will act as if it was eaten. Food objects will spawn in clumps of 3–5, and all objects in the same clump will have the same properties. The properties and locations of clumps will change every generation. If an entire clump runs out of respawns, a new random clump will spawn. Lastly, agents will have a limit on how often they eat; after eating one food object, a timer will count down until that agent can eat again. This is to prevent agents from hogging a good food supply, or dying by accidentally eating too much harmful food.

Algorithm:

As discussed earlier, in order to maintain efficiency without placing an upper limit on agents’ intelligence and sociality, I plan on starting with a traditional EA, then transitioning to a real-time evolutionary simulation once the agents start to reach the intelligence cap imposed by the EA.

Since the core of the framework is a neural net, I plan on using NEAT style reproduction. For the first phase of the simulation, I will use an unmodified NEAT, giving the agents around 10,000 physics frames to be evaluated in their environment.

During the second phase, I will continue to use NEAT style crossover, however new agents will only be produced when two nearby agents signal they’re ready to reproduce. At that point, 1–3 new agents will be produced depending on the parents’ current fitness. Agents’ fitness scores will count down every physics frame, repurposing them as a “time left to live” counter. This way, agents are still encouraged to find food.

In the second phase, since fitness doesn’t matter anymore, creatures will no longer be incentivized to reproduce exactly once, but instead they will be incentivized to reproduce whenever it makes sense, whether that’s many times in their life or still only once.

Easing:

In this setup, easing is built into the challenges and the fitness function — as agents get more intelligent, they will be able to collect more and more food in the same period of time, increasing their fitness without bound. Also, as agents become more social, those with similar genes will be able to get more and more food due to the cooperation of their relatives.

Further Plans

Evolving intelligent control of evolved 3D animats with built in behavior routines

You might say this seems too simplistic to evolve any kind of intelligence. And you’d probably be right. This is just a proof of concept, I’m hoping to see very simple cooperation between agents and not much else. The real simulation will come later, and it will be built on reimplementations of two existing algorithms: one to evolve fully functional 3D bodies and one to evolve plants for the environment. The plants algorithm is likely not necessary, and I may avoid it in favor of hand created plant models.

The algorithm I chose to evolve bodies with has an additional benefit beyond evolving advanced 3D bodyplans — it also evolves basic physical behaviors. Compare for instance how in animals, the brain sends a simple signal to the brainstem to cause the creature to walk, or a stronger signal to run. The control structure this algorithm evolves will function exactly like the brainstem; it will receive a signal to walk, and will handle coordinating all the body’s muscles to cause the body to actually walk. As such, I will call the control structure this algorithm evolves the brainstem. The presence of this structure is a huge advantage because it allows for separation of concerns. The brain can focus entirely on processing information and planning, while allowing the brainstem to handle the details of how exactly walking and other movements work.

So, with that in mind, my ultimate plan is to first run the body evolution as normal. Then when it’s done, I will run a similar simulation to my proof of concept, but in 3D and using the evolved bodies instead of the circles. In each creature, the brain’s action output nodes will be linked to the brainstem’s behavior control nodes (called sigma nodes by the algorithm’s paper).

I have future plans to try another framework as well: the core of which is that each agent’s genome encodes for a CPPN. This CPPN will be used to generate a few neural networks — a 2D convolutional neural network to preprocess visual information; a recurrent neural network to preprocess audio information; a 3D convolutional neural network to preprocess touch information; a recurrent neural network that takes all preprocessed sensory information (and any miscellaneous sensory information that does not get preprocessed) and outputs a “state” vector; a neural network that takes a state vector and outputs a single “reward” value; and a recurrent neural network that takes a state vector and returns an action. The final neural network in the list will be trained over time using the reward producing neural network to apply reinforcement learning.