Norman Di Palo
Jul 20, 2017 · 1 min read

Thank you!

Reinforcement learning algorithms are based on Markov Decision Processes to maximize the total reward. That means that at each step the algorithm decides to make a step and obtains a reward, and based on that it tries to improve its behaviour.

There are different algorithms, like Q Learning or Actor Critic methods. What differs mostly from generic algorithms is that RL algorithms tend to do at each step some update of function estimators, for example Q Learning updates at each step the value of Q(s,a), and based on that they change their behaviour to maximize the reward based on those predictions and approximations. They also have a rich mathematical background, that in some cases proves that the algorithm will converge to a particular solution.

Genetic algorithms, on the other hand, are more random, as you can see, much as nature’s evolution: improvements are often based on unpredictable genes mutation. As I showed, they can create very interesting results.

In the next post I will talk about RL techniques that I used in this same project, if you want to read more about it!

)

    Norman Di Palo

    Written by

    deep learning x robots. twitter: @normandipalo