Machine Learning: Reinforcement Learning — Game Theory

Michele Cavaioni
Machine Learning bites
2 min readFeb 7, 2017

So far we have been dealing with a world where, following a certain policy, an agent takes certain actions, but acting in a vacuum state, with no interaction with the outside world.

In gamy theory, agents evaluate how their decisions interact with the decisions of others, to lead to different outcomes.

In this environment we are now dealing with multiple agents, instead of a single one.

Game theory can be defined as the mathematics of conflicts.

In game theory the policy is the strategy, mapping all possible states to actions, in respect to one of the players of the game.

In a 2-player (A and B) game the player A’s goal is to choose a strategy that maximizes his/her rewards, while player B chooses the one that minimizes player A’s outcome.

Here below there are different games described by different characteristics:

  • In a 2-player, zero-sum (meaning that the sum of rewards is constant) deterministic game of perfect information, “minimax” (one side minimizes the maximum) is the same as “maxmin” (the other side maximizes the minimum) and there is always an optimal pure strategy for each player.
  • In a 2-player, zero-sum non-deterministic game of perfect information the Von-Neumann theorem shows that the minimax solution is the same as the Nash-equilibrium. The latter’s definition is below.
  • As we unhinge our restrictions and move therefore to a 2-player, zero-sum not deterministic game of hidden information, we can see that the minimax theorem doesn’t hold anymore as the strategy of one player depends on what the other does.
  • Finally, the prisoner’s dilemma game is the case of a 2-player, non zero-sum, non deterministic game of hidden information. In this game there is an equilibrium point , defined by the intersection of the dominating strategies for each player. This equilibrium is called the Nash-equilibrium.

Nash equilibrium:

Given a set of strategies, if I randomly choose one of the players and give that player a chance to switch their strategy, they will have no reason to do it, as there are no strategies that give better outcome, assuming that the other player’s strategy stays the same.

This blog has been inspired by the lectures in the Udacity’s Machine Learning Nanodegree. (http:www.udacity.com)

--

--

Michele Cavaioni
Machine Learning bites

Passionate about AI, ML, DL, and Autonomous Vehicle tech. CEO of CritiqueMatch.com, a platform that helps writers and bloggers to connect and exchange feedback.