Machine Learning in 100 Lines of JavaScript

Published in

Outco

12 min readSep 17, 2018

And 3 Simple Steps

With machine learning being all the rage these days, I thought I’d write a post where I show a step-by-step implementation of a really simple algorithm that illustrates some of the basics.

If you’re a beginner who’s interested in dipping your toes into the field then hopefully this will shed some light on how a lot of these algorithms work and will demystify what goes on under the hood. You don’t need to know advanced math to follow along, just have some intuition about probability and some basic coding ability. My goal is to convey some of the powerful principles that many practical machine learning algorithms actually use.

So, without further ado, the algorithm I’ll be talking about today is a type of genetic algorithm. This is because it’s the “algorithm” life uses for evolution. And having majored in Biology in college, it’s exciting to see so many applications to all the evolutionary theory I had learned.

To understand how genetic algorithms work, there are three things we need to model:

The Fitness Function
Natural Selection
Random Mutation

Let’s dive right in.

The Fitness Function

Camouflage is an important trait in the animal kingdom. It’s used by both predators and prey, and can be surprisingly sophisticated. So much so that to a lot of people it looks like these creatures had to have been designed:

The reason camouflage exists is because it’s an effective survival tool. It helps organisms survive long enough to pass on their genes, either by helping them to avoid being eaten, or by helping them to eat.

The algorithm I’ll show you today models how some really simple “organisms” can evolve to blend in with their “environment”.

So suppose there is some environment with some arbitrary background color and some organisms living on it. To start, these organisms will all have random colors, and will just be represented by circles.

Whether they are trying to avoid predators, or sneak up on prey, these organisms will need to blend in with their environment. Those that are closer to the color of their environment will tend to fare better and those that stick out more will fare worse.

Fitness is a term used to describe how well suited an organism is to its environment, and it’s what Darwin was referring to when he wrote “Survival of the Fittest”. In machine learning we have something called a fitness function. It can take in a variety of inputs, but what it computes is a number that describes how good an entity is at achieving a certain goal.

The fitness function is the invisible judge of what makes a “good” organism and what makes a “bad” one.

Here our fitness function is simple. Every organism has a different color, represented as an RGB value, and that color differs from the background color, which also has an RGB value. So the output of our fitness function is just the absolute value of the difference between an organism’s RGB values and the environment’s.

The more an organism’s color differs from the environment’s, the larger this value becomes. The maximum it could be is 765 (since the environment RGB values could be (0, 0, 0) and the organism’s could be (255, 255, 255)). And the minimum it could be is 0 if its color perfectly matches the environment’s.

Because a higher value actually means an organism sticks out more, I’m going to change the name from “fitness” to “weakness”, since an organism will want to minimize this value.

So let’s define our organisms:

class Organism {
  constructor(r, g, b) {
    this.r = r;
    this.g = g;
    this.b = b;
  }
}

And our environment:

class Environment {
  constructor(r, g, b, startingOrganisms) {
    this.r = r;
    this.g = g;
    this.b = b;
    this.organisms = [];
    this.weaknesses = [];
    this.startingOrganisms = startingOrganisms;
    this.addOrganisms(this.startingOrganisms);
  }  // Adds a list of organisms with random fitnesses
  addOrganisms(num) {
    while(num > 0) {      let newOrganism = new Organism(randomValue(256),
                                     randomValue(256), 
                                     randomValue(256));      this.organisms.push(newOrganism);
      this.weaknesses.push(this.calculateWeakness(newOrganism));
      num -= 1;
    }
  }  // Calculates weakness for a given organism
  calculateWeakness(organism) {
    return (Math.abs(this.r - organism.r) + 
            Math.abs(this.g - organism.g) + 
            Math.abs(this.b - organism.b));
  }

The addOrganisms method generates an array of organisms with random RGB values, and the calculateWeakness method just adds up the differences between the environment’s RGB values and those of an organism.

Natural Selection

The second piece of the puzzle is applying pressure from this fitness function in the form of natural selection. The fitness function has to somehow affect an organism’s ability to survive and pass on its genes. Organisms with a lower weakness have a lower chance of being spotted by predators, and therefore have a lower probability of being killed because they stand out less, for example.

Natural Selection is the “invisible hand” of the system. It is the outside pressure that guides them towards being more adapted to their environment.

We’ll also be applying this fitness function over multiple generations of organisms. I’ll talk about how that looks later on, but just to give some sense of scale, for some of my simulations I ran over 20,000 generations (though each generation was only around 10ms long; more on that later).

There are many ways to implement a generation, but the way I decided on involves removing only a single individual from the population per generation, and replacing it with a new one.

But the decision of WHICH individual to remove is key .

Because to make this work, we want individuals that have a higher weakness to have a greater chance of being killed off. We want individuals that stand out from the background color more to be removed from the population with a greater likelihood.

The way I decided to do this was with some interval math.

Imagine you have a rectangle, that is broken up into 5 segments of different lengths.

If you were to throw a dart at the rectangle, you’d expect the longest segment to have the highest chance of being hit because of the fact that it’s the longest.

Now let’s add some color.

The segment lengths are proportional to how far off an organisms color is from the background’s. The greater the difference, the longer the rectangle.

Now let’s add some numbers to define those ranges more clearly:

If we choose a random number between 0 and 400, that number has the greatest chance of falling between 150 and 300. But wherever it falls, let’s remove the corresponding organism.

The algorithm for this part is pretty straightforward.

We generate a random number in the total range (0–400)
Find the sub-range that contains that random number and remove that organism.

We can do this efficiently using a modified binary search, but since the number of ranges is small, we can just iterate from the first range until we find a range that contains the random number we’re looking for.

Let’s take a look at the code for natural selection:

replaceOrganism() {
    let e = this;// 1) Select a Random Organism to reproduce
    let randomIndex = randomValue(this.organisms.length)
    let parentOrganism = this.organisms[randomIndex];
    let newOrganism = parentOrganism.reproduce();// 2) Use intervals to weight individuals by weakness
    let intervals = createIntervals(this.weaknesses);
    
    //Sum of all weaknesses
    let intervalRange = intervals[intervals.length - 1][1];     let randomOrganismToReplace = randomValue(intervalRange);// 3) Remove a random individual, weighted by weakness
    intervals.forEach((interval, index) => {
      if(isWithinInterval(interval, randomOrganismToReplace)) {
        e.organisms[index] = newOrganism;
        e.weaknesses[index] = e.calculateWeakness(newOrganism);
      }
    });}// Detect if number falls within a range
function isWithinInterval(interval, num) {
  return interval[0] <= num && num < interval[1];
}// Creates Intervals from array of ints
function createIntervals(arr) {
  let runningTotal = 0;
  let result = arr.map(int => {
    let newTotal = runningTotal + int;
    let result = [runningTotal, newTotal]
    runningTotal = newTotal;
    return result;
  });
  return result;
}

We need to replace the organism we’re removing if we want to keep the population constant. Because reproduction with our creatures is asexual, we only have to select a single parent. We also have to decide WHEN we do the reproduction round; before or after we remove an individual. If we do it before, there’s a chance the individual we remove will pass on its genes.

In my code I opted for simplicity. I choose a random parent to have an offspring that would end up replacing whichever organism we remove later on. I also made it so all organisms have an equal probability of passing on their genes, even the ones with higher weaknesses.

But it needn’t have been this way. If I had done the replacement after the removal, it would have lowered the odds of “bad” genes being passed on and the population likely would have evolved more quickly. Or, we could have had some other way of measuring fitness (the inverse of weakness perhaps) as a way of increasing the probability of certain organisms passing on their genes.

Because at the end of the day, all we’re doing is weighting the probabilities of some organisms being selected for removal over others. We could easily do something similar for reproduction.

But the power of this system is that even by just weighting which organisms have a higher chance of being removed, the population will evolve over time to become more fit (ie closer to the background color).

Mutation

Though there is one final piece that’s missing and that’s mutation.

Without random mutations, ie slight variations between parent and child generations, we would not necessarily get greater fitness over time.

Instead, what would happen is that eventually the fitness level would stabilize at basically some arbitrary point. Just by random chance, one of the original organisms would take over, even if by luck alone. Because all that needs to happen is for it to have a single offspring and its chances are then doubled for passing on its genes to the next generation. Repeat that process a few more times and you get the possibility of an individual with lower fitness taking over the entire population.

But even if the fittest individual from the start survived and took over, the population would stagnate, and would not continue to evolve over time.

This isn’t what we see in nature. In nature we see gradual change over time, and in fact we’ve even used it to our own benefit, artificially selecting tiny variations in traits we like, and weeding out the ones we don’t:

To model this behavior, we need to make it so that when an organism “reproduces”, the new organism it creates isn’t an EXACT copy. We want its RGB values to differ by some random, small amount from its parent’s. Sometimes a parent might create an exact copy, but usually it will mean offspring are either slightly more or less fit than their parent.

If you allow your mutations to be very large, the offspring can differ considerably from the parent generation, which isn’t necessarily something that you want. A large mutation rate makes it harder to reach stability in your population, and even if an organism reached some optimal fitness level, odds are its offspring would erase those beneficial traits in the next generation. So trying to “force” evolution with a high mutation rate doesn’t really work and usually leads to chaos.

Mutation is a subtle force, but a powerful one over long enough timespans. Because even if an organism with lower fitness manages to take over early on by chance, over time it will drift towards optimality. It’s like the erosive force of water that can carve out the grand canyon.

Slow and steady, the blind watchmaker does his work.

And in our example it doesn’t matter what the starting point is, the population will trend towards blending in with the environment. It doesn’t matter if initially the differences between organism and environment were small, or large. They will always converge to being same color.

In fact, if you changed the environment’s background color, the population of organisms would start to drift towards that new color instead. And you wouldn’t have to change a single line of code!

Here’s what all this looks like as methods on the Organism class:

reproduce() {
    // Prevents negative numbers from appearing
    let newR = Math.max(this.r + this.mutateTrait(), 0);
    let newG = Math.max(this.g + this.mutateTrait(), 0);
    let newB = Math.max(this.b + this.mutateTrait(), 0);
    return new Organism(newR, newG, newB);
  }mutateTrait() {
    let multiplier = Math.random() - 0.5;
    if(multiplier < 0) {
      return randomValue(this.mutationSize) * -1;
    } else {
      return randomValue(this.mutationSize);
    }
  }

Observations

Putting it all together, there are some key observations we can make.

In the code I’ve attached below, there are a few parameters to play around with.

The first is the generation time, which is just the number of milliseconds between cycles, where a single cycle involves removing an organism and generating a new mutant. You’ll want this to relatively small since it takes quite a few generations to see significant changes (again, some simulations I ran took 20,000 generations to reach optimality).

The second parameter is the population size. This is how many organisms are in the environment at any given time, and in this model that number is held constant. Generally speaking the larger the population, the more time it takes to reach some kind of optimum because it takes longer for “good genes” to take over the entire population.

The last parameter is the mutation rate. This one is key because it dictates the fate of the population. Make it too large, and the population never seems to stabilize and can’t trend toward anything. Make it too small and the population won’t evolve enough and the fitness level will get stuck. In my model there are also some quirks because of the way the math works out with flooring, so anything below 2 has this problem of getting stuck. But choose the right mutation rate and the whole population should eventually all match the background color.

One particularly interesting observation/hypothesis is that if the mutation rate is somewhere in the middle, the population will enter a dynamic equilibrium, where it seems to hover around a certain average weakness value. And it can be computed from this formula:

(number_of_traits * mutation_rate_per_trait)/2

In this case the number of traits is always 3 (rgb), and each one has the same mutation rate, so the average weakness is 1.5 times the mutation rate, with greater mutation rates having a higher deviations from that average.

Try it out for yourself, and let me know what you think!

At its core, a lot of what machine learning boils down to is optimizing (maximizing or minimizing) some value against some outside constraints, rewarding any changes that move you closer to that optimum and punishing any that move you away from that optimum.

My goal with this project down the line is to have a live visualization of the organisms reproducing and mutating, and putting a frontend on top of it so people can play around with the parameters in a more user friendly way. This is just one of my first steps into the field, but hopefully I’ll be able to work on a lot more exciting ML projects in the future!

And the beauty of all this to me is how it illustrates randomness producing something that appears intelligently designed.

Here’s the original code on Github, in just over 100 lines of JavaScript.

All this was made possible because of things I learned at Outco. Though machine learning isn’t part of our curriculum, a lot of the fundamental algorithms and data structures needed to get started in the field are.

If you’re interested in the program, check us out at Outco.io. New classes starting every month!