Photo by Hilda Trinidad on Unsplash

EVOLVE YOUR MATRICES!

Lavish Chauhan
9 min readDec 29, 2022

--

Yes, we would literally evolve our matrices in this post, by something called genetic algorithms!

So, the objective of this article is to familiarise you with the basics of genetic algorithms and then apply whatever we learned to an interesting optimization problem, thus introducing you to another fun yet powerful optimization algorithm. This article assumes your familiarity with python and basic programming. You can skip the coding part if it's confusing and can directly jump to the graphs at the end. Let's get started!

Introduction

Genetic algorithms are used to solve optimization problems and are no different than your familiar SGD or Adam. They belong to the larger class of evolutionary algorithms, which include other algorithms like swarm intelligence and differential evolution, but that’s for another day.

Genetic algorithms are inspired by natural selection, and hence you would come across familiar biological terms.

Lets us first define these basic terms which will help us understand the basic structure of all genetic algorithms

  • Generation — Group of genetically related organisms constituting a single step in the line of descent
  • Population — A subset of the generation that can interbreed
  • Mutation — Variations in the gene that can be transmitted to the next generation
  • Crossing — Exchange of chromosomes between parents
Basic steps of a genetic algorithm

Now that we are clear with the basic terminology let's see how we can use genetic algorithms to solve an optimization problem

Problem statement

You have been given a nxm grid, some of the cells have heights and others don’t. Your task is to fill the empty cells so that the whole terrain represented by this grid is as “plain” as possible.

A sample 5x5 grid

Mathematically, lets first define the cost between two adjacent cells of height h1 and h2 as,

Now the “unevenness” of a nxm grid is defined by, the sum of cost over all adjacent cells. We have to minimize this unevenness.

All grid values must lie between 0 and 50(integers) when you fill them.

Left: An even grid ( more fit individual ), Right: An uneven grid ( less fit individual )

Solutions

Simple solution: A simple solution that comes to mind would be to take the average values of all non-zero cells and fill them in the cells with zero height. We’ll compare our genetic algorithm at the end and see if it learns to perform better than this logical solution. I’ve skipped its code since it's trivial to do.

Possible solution using genetic algorithm: Lets see what our pre-defined terms would mean here :

  • Generation — Current iteration
  • Population — A collection of grids that have some pre-given cells fixed and other cells could take any value
Blue cells represent fixed cells
  • Mutation — This is where genetic algorithms become really interesting, you can choose to mutate your population however you want since its definition is a small change in the individual. For keeping things simple I would increase/decrease the value of some nonfixed cells by a random amount. Make sure to not make a major change while mutation otherwise your population might not converge ( see graph at the end )
The green cells went under mutation
  • Crossing — Here again, things can be really flexible as you can define yourself how your individuals would be crossed. I will just cross my grids by putting the left and right halves of both together
Blue parts of both grids combined during the crossing

One thing to keep in mind, offsprings would differ from parents in just one thing and that is cells that had zero height in the initial cell, we are not allowed to alter the cells which already have some height.

Figure explaining genetic algorithm solution ( blue cells represent fixed cells ), the loop is run for num_gen iterations

We check each grid that's formed in this whole process and pick the fittest one, what we sought out to find

Time to put our ideas to code. Let's create functions for each of the tasks defined above

Firstly, let's make a function that generates a random grid ( with some empty cells)

# generate a random grid
# k -> % of grid you want to cover with non zero cells

def gen_test(n, m, k):
grid = [[0 for _ in range(m)] for _ in range(n)]

# times gives the number of cells to be covered (k% of nxm)
times = n * m * k / 100

while times > 0:

# randomly choose a cell
x = random.randint(0, n - 1)
y = random.randint(0, m - 1)

# if cell is filled continue
if grid[x][y] != 0:
continue
if x - 1 >= 0 and grid[x - 1][y] != 0:
continue
if y - 1 >= 0 and grid[x][y - 1] != 0:
continue
if x + 1 < n and grid[x + 1][y] != 0:
continue
if y + 1 < m and grid[x][y + 1] != 0:
continue

grid[x][y] = rand() % 50 + 1
times -= 1
return grid

We’ll also definitely need a function that would evaluate how to fit an individual(grid) is

# Gives log score of a particular grid

def log_score(grid):
score = 0
n = len(grid)
m = len(grid[0])

for i in range(n):
for j in range(m):
if i == 0 and j == 0:
continue
# i==0 and j==0 represents edges of the grid
if i == 0:
# cost = 2^abs(h1 - h2)
score += 2 ** abs(grid[i][j] - grid[i][j - 1])
elif j == 0:
score += 2 ** abs(grid[i][j] - grid[i - 1][j])
else:
score += 2 ** abs(grid[i][j] - grid[i][j - 1])
score += 2 ** abs(grid[i][j] - grid[i - 1][j])
return math.log2(score)

Now, let's create a function that would cross two individuals and also mutate the resultant offspring


# Produces an offspring of two individuals by crossing and mutating
# parent1 -> first grid
# parent2 ->second grid
# pro -> probability of mutation on a single grid cell
# val -> maximum value by which the particular cell can change due to mutation
# fixed -> boolean grid to refer to fixed cells in the parent grids


def cross_and_mutate (parent1, parent2, pro, val, fixed):

n = len(parent1)
m = len(parent1[0])

# initialzing offspring , same size as parent

offspring = [[0 for _ in range(m)] for _ in range(n)]
c = n * m
for i in range(n) :
for j in range(m) :

# if the cell is fixed offspring would be same as parent
if fixed[i][j] :
offspring[i][j] = parent1[i][j]

# else fill it
else :

# half to parent 1
if c > n * m / 2 :
offspring[i][j] = parent1[i][j]

# other half to parent 2
else :
offspring[i][j] = parent2[i][j]

# choose cell to mutate with probability pro

if random.uniform(0, 1) <= pro :

# add val to cell 50 % of the time
# and subtract val from cell value rest 50% of the time
if random.randint(0, 1) == 0 :
offspring[i][j] += random.randint(1, val)
else :
offspring[i][j] -= random.randint(1, val)

# clipping values (values might exceed limit after mutation)
if offspring[i][j] < 0 :
offspring[i][j] = 0

elif offspring[i][j] > 50 :
offspring[i][j] = 50
c -= 1
return offspring

Now, we have all the baseline functions and we are ready to write the main simulation function.

Here the structure I have used, uses a utility function ( simulation_util ) that takes in a population of the current iteration and returns the population of the next iteration by crossing the fittest individuals, and in the simulation, we run this process as many times as we want in a while loop.

# utility function for simulation function
# takes one generation as input and outputs next geenration after crossing
# mem -> members of ith generation
# rest of the arguments have been defined before

def simulation_util( mem, fixed, frac, pro, val):

population_size = len(mem)

# number of "fittest" individuals to be used for crossing
top = frac * population_size

# minimum 2 individulas are required to cross
if top < 2:
top = 2

# stores scores,indices tuple of individuals
tops = []

for i in range(population_size):

# if score is best till now, store it
# remember, the lower the score the better
sc = log_score(mem[i])

if sc < best_score:
best_grid = mem[i]
best_score = sc

tops.append([sc, i])

# find the fittest individuals
sorted(tops.begin(), tops.end())

# list to store i+1 th population
new_mem = []

# cross fittest individuals population_size times
while population_size > 0:

index_parent1 = random.randint(0, top - 1)
index_parent2 = random.randint(0, top - 1)

# parents cannot be same
if index_parent1 == index_parent2:
continue

# cross and add offspring to new members
new_mem.push_back(cross_and_mutate(mem[tops[index_parent1][1]],
mem[tops[index_parent2][1]], pro, val, fixed))
population_size -= 1

return new_mem
# simualting the evolution 
# grid -> original grid in which we have to fill empty cells
# population_size -> number of grids in a particular iteration
# frac -> top frac% of the population would be crossed to make the next generation
# num_gen -> number of generations for which we want to run the simulation
# pro -> probability of mutation on a single grid cell (check cross_and_mutate)
# val -> maximum value by which the particular cell can change due to mutation (check cross_and_mutate)

def simulation (grid,fixed,population_size,frac,num_gen,pro,val) :

n = len(grid)
m = len(grid[0])

# vector of grids storing a particular generation

mem = []

# generating the first generation

for iter in range(population_size):
new_grid = grid

for i in range(n):
for j in range(m):

if grid[i][j] == 0:
new_grid[i][j] = random.randint(1, 50)

mem.append(new_grid)

# run simulation for num_gen generations

while num_gen > 0:
mem = simulation_util(mem, fixed, frac, pro, val)
num_gen -= 1
return

Now, for the current run, the parameters I have chosen are,

  • Grid parameters — 50x50 grid with 30% cells filled initially, rest empty( N = 50, M = 50, K = 30 )
  • Population parameters — 100 individuals in each population with 10% reproducing to make the next generation, ran over 3000 iterations ( POPULATION_SIZE = 100, FRAC = 0.10, NUM_GEN = 3000)
  • Mutation parameters — 5% chance that value in a grid cell changes by 2 ( PRO = 0.05, VAL = 2)

The best score (minimum) we get is 30.9338, which is better than 32.4152 by a simple approach which we discussed above! Feel free to play with the parameters and find which ones give the best grid.

Fun graphs

Below I’ve plotted the average loss of each population and also the variance of loss in each population over iterations

These results tell us two things,

  1. Our algorithm along with the choice of hyper-parameters was converging
  2. The variance of the population over time is almost constant, which tells us that the individuals are as unique as their 1000th ancestor even though they are just a fraction of their DNA :)

Let's also see what happens when you do erratic mutations

PRO = 0.05, VAL = 30 : 5% chance that value in a grid cell changes by 30

or no mutations at all

PRO = 0 : Slower loss curve and notice all individuals becoming copies of each other ( 0 variance )

or let all the population breed ( Survival of the fittest ….. and the unfittest)

FRAC = 1

This problem was taken from CodeChef ( there exist far better solutions for this problem than one discussed here using genetic algorithms, try to beat my score there ;) ) - https://www.codechef.com/problems/LAND

Feel free to go through other resources to read more about genetic algorithms. My suggestions would be :

  1. https://www.geeksforgeeks.org/genetic-algorithms/
  2. https://www.youtube.com/watch?v=EOaPb9wrgDY
  3. https://medium.com/python-in-plain-english/implementing-particle-swarm-optimization-from-scratch-191bc93e48d4

--

--