Genetic Algorithm (GA) with R Package Rgenoud

Puneeth Venugopal
Data Science - With Live Case Studies
6 min readJun 14, 2018

This document will help us understand how the Genetic Algorithm works internally using the R package Rgenoud. Below you will see the internal mechanism of the algorithm, that is partially covered with simple illustration, also the simple working code in R for two problems.

Introduction

Genetic algorithm is designed based on human genetic evolution i.e having one generations of humans evolved from the best genomes(population) of the previous generations. Similarly, GA produces multiple populations(solutions) in each generation to a given problem and picks the best solutions to generate the solutions for the next generation.

Graphical Representation

What Happens Inside GA?

-GA algorithm produces random solution in the first generations if there is no seed values (starting solutions) are provided

- Best solutions, with least or most return value based on the nature of optimization, are picked on which genetic operators is applied to produce a new solution as part of the second generation

- GA produces more unique random solutions in the second generation

- This process continues until the most optimal solutions is reached or the generation hard limit is reached

What Are Genetic Operators?

GA has two basic genetic operators which are Cross Over and Mutation.

Cross Over: Two parent solutions are selected and their attributes are swapped to produce modified child solutions.

Mutation: A parent solution is picked and altered to produce a better child solution.

RGenoud in total has 9 genetic operations which are different forms of the basic cross over and mutation. User can choose to set the weightage for each of the 9 operators as (P1=20, P2=15….P9=10). Higher the weightage higher the usage of the operator in creating new solutions.

Example 1:

GA for continuous data

GA is a very strong and robust algorithm to solve a linear equation. Business problem can be reduced to a form of an equation on which we run the GA algorithm to find the exact values of the variables which produces the maximum or minimum possible results.

Below is an example of the business problem which has been reduced to a form an equation.

The code below runs GA algorithm to determine the values of x1, x2 and x3 such that the value of y is maximum.

setwd("D:/Puneeth Workshop/personal/Personal Workshop/Genetic Algorithm Paper")library("rgenoud")##---
# Using GA to solve an equation with floating points.
##---
#Define the objective functionfunc <- function(sol){#GA provides the solution/population as a vector of length 3
x1 <- sol[1]
x2 <- sol[2]
x3 <- sol[3]
# Equation
y=3*x1 + tan(x2) - cos(x3)
#Return the value to GA
return(y)
}# Set the boundary values for each parameter X1,X2 and X3
# X1 should be between value 3 and 5. x2 between 2 and 8.
mat <- matrix(c(3,5,2,8,1,4),3,2,byrow = TRUE)
# Run the GA Algorithm
GA <- genoud(func,nvars = 3,max = TRUE,pop.size = 100,max.generations = 100,wait.generations = 10,Domains = mat,boundary.enforcement = 2,print.level = 2)
# Maximum possible solution of the equations
GA$value
# The parameters of the solutions
GA$par

Objective Function (func): Simple user defined function where solution is carried out

nvars: Number of variables passed into the objective function

max: Define if the solution is a maximization or a minimization problem

pop.size : Number of population or solution within each generation

max.generations: Max generation produced

domains: sets the boundary limit for each variable

boundary.enforcement: ensure variable values are within defined boundary limit

Below graphs shows how the return function converges to the maximum possible over each GA generation.

Example 2:

GA for discrete data with Constrains

One of the most integral feature of GA is to solve a problem by simulating multiple possible scenarios of the problem and select the best possible solution also apply constrains by using penalty concept. Any solution to a business problem must meet business constrains to make it practical and consumable. In each generation, GA algorithm identifies good solutions that does not meet constrains and applies a penalty to make it a bad solution.

Graphical Representations

Below is an example of a warehouse problem. We need to identify the best stocking scenarios to achieve least total cost that also meet physical warehouse constrains.

Consider we have 10 products each having different volumes. All 10 products needs to be stocked across 4 different warehouses each having different capacity constraints and different costs per unit volume.

Product Volume table.

Warehouse capacity cost table.

In an ideal situation, the algorithm would stock all 10 products in warehouse W1 having least cost per unit volume. However, that would not meet the capacity constraint of the warehouse. Hence, we apply a penalty to the solutions that do not meet the constraints. This ensures that the final solution we arrive at is the best possible solution having met all the applied constrains.

Below is the code

#GA to solve discrete data to simulate multiple scenarios and use constrains and penalty.library("rgenoud")
#Create the tables
Prod <- data.frame(prod=c('P1','P2','P3','P4','P5','P6','P7','P8','P9','P10'),Vol=c(5,10,15,5,20,15,10,20,5,5))WH <- data.frame(WH=c('1','2','3','4'),WH.Cost=c(5,10,12,15),Capacity=c(25,25,25,55))func <- function(sol){

#Add the product-warehouse allignment by GA to the product table
Prod <- cbind(Prod,sol)

#Obtain the warehouse costs based on the stocking allginments
Prod <- merge(Prod,WH,by.x="sol",by.y="WH")

#Determine the total warehouse stocking costs
Prod$cost <- Prod$Vol * Prod$WH.Cost

#Total stocking Cost
Tot.Cost <- sum(Prod$cost)

#Check for Warehouse Capacity
Cap <- aggregate(Prod$Vol,by=list(Prod$sol),FUN=sum)
Cap$Group.1 <- as.factor(Cap$Group.1)

#Apply Penalty if the capacity constrains are not met
Tot.Cost <- ifelse(Cap[Cap$Group.1=='1',2] > 25 || Cap[Cap$Group.1=='2',2] > 25 || Cap[Cap$Group.1=='3',2] > 25 || Cap[Cap$Group.1=='4',2] > 55,Tot.Cost * 1000,Tot.Cost)
#Return Value
return(Tot.Cost)
}#End of objective function
# Define Boundary Matrix
# Each product should be stocked across any of the 4 warehouses
mat <- matrix(rep(c(1,4),10),10,2,byrow = TRUE)
GA <- genoud(func,nvars = 10,max = FALSE,pop.size = 500,max.generations = 100,wait.generations = 10,Domains = mat,boundary.enforcement = 2,data.type.int = TRUE)
#data.type.int will ensure GA produces only integers between the set domains ie 1 & 4
# Maximum possible solution of the equations
GA$value
# The parameters of the solutions
GA$par

Stocking Scenario simulated by the algorithm.

We observe that algorithm has produced the best possible by stocking upto max capacity in warehouses with lower stocking costs i.e W1, W2, W3 and stocked the remaining products in W4 as it has higher stocking costs. Also solution and has met the capacity constrain.

Conclusion

GA is one of the simplest yet powerful algorithm that can be used for diverse nature of problems. Since it simulates large number of solutions it can be a highly computational task for some problems with large datasets and require powerful computing systems.

--

--