Genetic algorithm for Sharpe Ratio maximizing within a universe of 70.000 funds.
Motivation
As many of you know, portfolio optimization is one of the biggest challenges in financial markets and asset management. This article shows how is possible to find enough good combination of assets and weights in a reasonable computational time for a universe of 70,000 funds with one-year trading history by implementing evolutionary computing techniques. In contrast to other works I have found where the assets are static and the evolutionary approach focuses on finding the best combination of weights for that portfolio; this approach deals with a much more complex problem, finding the combination of assets and weights within a universe of 70,000 funds.
Evolutive Computing
Evolutionary Computing is a field of computer science that uses stochastic search processes based on natural evolution to solve optimization problems in areas such as planification, design, etc.
Genetic Algorithms
Genetic algorithms would be the most widely used variant within evolutionary computing. John H. Holland introduced genetic algorithms as we know them today in 1962 [Holland 1962, Holland 1965]. Holland later summarized this work in his book Adaption in Natural and Artificial Systems. Years later, David E. Goldberg made them popular, by giving them a practical approach in his book Genetic Algorithms in Search, Optimization and Machine Learning.
Sharpe Ratio
The Sharpe Ratio measures the excess annualized return per unit of risk for a zero investment strategy. By risk, I mean the annualised standard deviation of returns, and by excess return, I mean returns over the riskless rate.
This will be our fitness function to maximize.
¿How Sharpe Ratio is calculated?
Coding our genetic algorithm
Individual
Okay, let’s go by parts. We already know the equation to calculate the Sharpe Ratio that will be our fitness function to be maximized, but how do we use it in our genetic algorithm?
The first thing I will do is define the class Individual that will represent a solution from the set of explored solutions.
The asset universe will be common to all individuals. For this, I decided to add the asset set as a class attribute. In this way, all instances of individuals will have access to the asset universe.
These assets universe will be a data frame where the columns represent the investment fund, and the rows will be its closing prices from 2020–01–03 to 2021–12–11. This DataFrame will be clean, homogenized and bias-free.
On the other hand, an individual receives two vectors as initialization parameters, one of them will represent the assets from the universe through indexes, and the other one will be a vector of weights of the same length, which will represent the weight of each asset within the individual’s portfolio. Since the universe of assets is huge, I decided to limit the number of assets that individuals’ portfolios can hold by restricting them to a maximum of 20 assets and a minimum of 1.
Once the structure of an individual is defined, it is necessary to add more operations that help us achieve the objectives of the use case.
Calculating mean returns
Calculating the risk
Calculating our fitness function (Sharpe Ratio)
In this case our risk-free rate will be 0.
Individual selection
I will use proportional selection with elitist ranking and migratory replacement for this case.
“Migratory replacement” is something I have come up with to add diversity to the solution set by adding new individuals. As the search space is so large, it is necessary to add much diversity to our population in order to avoid falling into local minimums.
This type of selection assigns a probability to each individual in order to be selected according to its fitness function. This allows the best-adapted individuals to have more possibilities of being selected and therefore to be more present in the next generation. In comparison, the worst adapted individuals have fewer possibilities of being selected.
On the other hand, as it is elite driven, it chooses the best adapted n% and adds them to the next generation to ensure that the best individuals spread their genes.
Another percentage is allocated to a migratory replacement, where new individuals are incorporated into the selection, thus favoring greater diversity and therefore greater exploration.
Crossover method
Between generations, the number of assets of each individual will be variable. For this, I will use modulus 20 such that 20 will be the maximum number of assets an individual can have while the minimum will be 1.
My crossover method continuously prevents the resulting portfolios from falling into minimums or maximums (1 or 20 assets). For example, if two parents have 14 and 17 assets respectively when working in modulus 20, their offspring will have 11 assets, while if two parents have 2 and 3 assets respectively, their offspring will have 5 of those assets.
The offspring also inherit the weights from the parents; in this way, I make sure that weights never sum more than 1. If two parents have the same asset and this has been selected for the offspring, this asset is not inherited twice, but the weights are summed and inherited only once. In this way we avoid the offspring having several repeated assets.
When an offspring inherits the assets from his parents, he also inherits the weights following the following equation:
OffspringA = α ∗ Parent1+ (1 −α) ∗ Parent2
OffspringB = (1 −α) ∗ Parent1+ α ∗ Parent2
Where α is a random number between 0 and 1.
Thus some individuals may have few assets with a large percentage of accumulated capital.
Mutation method
In mutation operation I replace a random number of assets from the individual with assets from the universe of assets; in that way, I add diversity to the population. The probability of mutation will be variable in each generation such that in some generations, exploration will be rewarded while others will focus on exploiting the current solution subspace. If the mutated individuals do not have a better degree of adaptation than the best individual, they are discarded.
Evolutive approach
This part contains the core of our genetic algorithm. It defines the steps in which the previously defined methods will be executed until the stop condition is met.
Putting all together
If you want to try the entire code yourself or follow along, go to my published codebase on GitHub:
References & inspiration
Carmona, Enrique & Fernández, Severino. (2020). Fundamentos de la Computación Evolutiva.
Ivan Grindin (2021). Learning Genetic Algorithms with Python:Empower the Performance of Machine Learning and Artificial Intelligence Models with the Capabilities of a Powerful Search Algorithm.
Holland, John H. (1992). Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control and Artificial Intelligence.
David E. Goldberg. (1989) Genetic Algorithms in Search, Optimization, and Machine Learning.
Pankaj Sinha. Et al. (2015) Algorithm of construction of optimum portfolio of stocks using genetic algorithm.
Emanuele Stomeo. Et al. (2019). Genetic Algorithm Application to Portfolio Optimisation.