Thank you!
Reinforcement learning algorithms are based on Markov Decision Processes to maximize the total reward. That means that at each step the algorithm decides to make a step and obtains a reward, and based on that it tries to improve its behaviour.
There are different algorithms, like Q Learning or Actor Critic methods. What differs mostly from generic algorithms is that RL algorithms tend to do at each step some update of function estimators, for example Q Learning updates at each step the value of Q(s,a), and based on that they change their behaviour to maximize the reward based on those predictions and approximations. They also have a rich mathematical background, that in some cases proves that the algorithm will converge to a particular solution.
Genetic algorithms, on the other hand, are more random, as you can see, much as nature’s evolution: improvements are often based on unpredictable genes mutation. As I showed, they can create very interesting results.
In the next post I will talk about RL techniques that I used in this same project, if you want to read more about it!
