Morality without God? Selfishness, Economics and Game Theory, Part Three

How can we use Game Theory to help us understand where morals come from?

DiplodocusCoffeeSpot
Equilibrium Thoughts

--

The question I want to address here is whether morality/doing the “right thing”/cooperating can occur in equilibrium. In other words, is there a way to get cooperative behaviour in a world where agents are completely selfish and choose actions that serve themselves only and there is no third party present to dish out punishments (credible or incredible).

In this piece, I will use the game theoretic framework I have setup elsewhere and extend it to games that are repeated over time. Interest in repeated games is natural; we live in a world where we bump into the same people frequently. The ideas here are directly inspired by/taken from the work of Ken Binmore who has worked hard to provide “scientific” foundations for moral behaviour.

The key idea is the following: in a world with repeated interactions even incredibly selfish agents will find it beneficial to cooperate if they value the future enough. Therefore, the moral codes we have today could be seen as the result of non-cooperative equilibria of repeated games that generate “cooperative” outcomes. In other words, you don’t need a third party threatening to punish crimes, you only need to care about the future to create a “cooperative”/”good” equilibrium.

The Prisoner’s Dilemma: A Recap

If you have read my other piece on the prisoner’s dilemma you can skip this section. If you are only reading this piece, then you will need the material in this section. I will describe the solution to the one-shot version of this game again. For more detail see my other post.

We can solve this game by looking at whether a specific configuration of actions by the players is stable, i.e. whether any player wants to unilaterally shift to an alternative action than the current configuration. The payoffs for this game are shown below:

Prisonser’s dilemma payoff matrix

The actions for each player are H (high) or L (low). In each cell, there is a pair of numbers. The first component of the pair is player one’s payoff, the second is player two’ s payoff. The letters on the left correspond to one’s move and the column headers are two’s move. If we want to see what the payoffs are from player 1 choosing L and player two picking H, we look at row L, column H and see that player 1 gets 10 and player 2 gets 0. The numbers represent the preference ranking for each outcome for the players but not necessarily monetary payouts. We will solve this game by using Nash equilibrium.

The easiest way to apply Nash is to go through all possible outcomes and check if anyone wants to unilaterally change their action, if so this is not an equilibrium. If no one changes their mind we have found an equilibrium. If we start with (H,H) (where the first component is one’s move and the second is two’s) then each will get 8.

(H,H) is not stable and player 1 wants to move to L

However, player 1 would like to switch to L. This is enough to rule (H,H) out as an equilibrium.

If we have outcome (H,L), then player 1 would like to switch to L. So, again (H,L) is not a stable outcome. If we then look at (L,L), we see that no one wants to change their mind. This is the unique prediction of this game.

So, that would be the solution to a static game. What is the problem here? We can see that each player can achieve a higher payoff than (4,4) if they both played (H,H). However, in the absence of a court or enforceable contract there is no way to guarantee that any player would adhere to the agreement.

Let’s think of this game in a harsher light: each player does not want to cooperate with the other and guarantee a good payoff for both. The temptation to screw the opponent over and to get a payoff of 10 is too strong for these agents. In other words, they are incredibly selfish and maybe even “immoral”. As a result of their selfishness they both miss out on the payoff of 10 and are left with the payoff 4.

However, the one-shot version is not that realistic. What if this game were repeated a large number of times? Might agents cooperate if they know they will face the same player multiple times?

The answer can be found in the folk theorem. It turns out that any outcome in a game can be sustained as an equilibrium in an infinitely repeated game. We do not need to go into the brutal details. Let’s focus on specific strategies and ask if they could sustain cooperation. I will work with one type of strategies called Grim-Trigger. Here’s how they work: any deviation from the desired actions of your opponents, will trigger punishment for infinity. If your opponent plays according to the desired action you do not punish and reciprocate.

Return to the prisoner’s dilemma. We would like to sustain cooperative outcome (H,H). Punishment would be to push your opponent to the lowest payoff possible which is to punish by playing L. However, before we can delve into how this strategy works we need to think about how payoffs might be computed in a repeated game.

Discounting

Thinking about repeated games requires us to think about how time affects the value of payoffs. To do so, we introduce a discount factor, δ, a positive number that is less than one. This factor can be used to express impatience. If I give you a payoff of 5 today you will value that at 5. If I, however, give you this payoff tomorrow it will be worth less from today’s perspective, since you are impatient. But exactly how much less? I simply have to ask you how much I need to give you today to make you indifferent between receiving that amount today and the payoff of 5 tomorrow. For example, you might tell me that a payoff of 5 tomorrow is worth 4.5 today. So, the discount factor is set such that:

δ5=4.5

In this case your discount factor is

δ=4.5/5=0.9

If you were to receive a payment of 1 two days (on day 3) from now you would discount that amount by δ²; from the perspective of day 2 the the value of 1 on day 3 is δ, and that from the perspective of day 1 is valued at δxδ. If I gave you a stream of payoffs of 5 for three days the discounted stream (the value of the payoffs from today’s perspective) would be:

5+δ5+δ²5=5(1+δ+δ²)

We are now in a position to consider a repeated prisoners’ dilemma game.

Repeat Offender: Repeated Prisoner’s Dilemma

What if the prisoner’s dilemma was repeated for an infinite number of periods (you can think of this as an uncertain number of periods because you do not know when you will die)? Let us focus on the aforementioned Grim-trigger strategy: play H forever unless your opponent plays L then you will punish with L forever. Is it possible to sustain the equilibrium where each player plays H forever? It turns out that it boils down to how large your discount factor is.

The player’s have the choice between two paths of play: in one we play H forever. In the other, I cheat, play L get 10 for one time period and then get punished forever where my opponent plays L forever. In response the best I can do is play L as well. So, the path of play would look something like this if player 1 were to deviate:

Player 1: H,L,L,L,L,L,L,…

Player 2:H,H,L,L,L,L,L,…

The deviation occurs in period 2, player 2 does not know this is going to happen, so he is still playing H but then he observes the deviation and punishes forever. Player 1 plays L to ensure a higher payoff. To ensure cooperative behaviour we need to make sure the payoff from the above deviation is lower than the payoff from playing H forever.

Payoff from H forever: 8+δ8+δ²8+δ³8+δ⁴8+δ⁵8+δ⁶8+…=8(1+δ+δ²+δ³+δ⁴+δ⁵+δ⁶+…)

Which can be shown (this is the sum of a geometric series) to equal

8/(1-δ)

since δ<1. The payoff from deviating is

8+δ10+δ²4+δ³4+δ⁴4+δ⁵4+δ⁶4+…=8+δ10+4(δ²+δ³+δ⁴+δ⁵+δ⁶+…)

which is equal to (with some creative factoring)

8+δ10+4δ²(1+δ+δ²+δ³+δ⁴+δ⁵+δ⁶+….)

Which simplifies to (where we use the formula for an infinite geometric series on the expression in parenthesis)

8+δ10+4δ²/(1-δ)

To ensure cooperation holds we need to make sure that:

8/(1-δ)>8+δ10+4δ²/(1-δ)

where the left hand side is the payoff from C forever and the right hand side is the deviation payoff. The above simplifies to

δ>1/3

If δ>1/3 then players will want to cooperate forever if each player uses Grim Trigger strategies. What does this all mean?

Cooperation as Equilibrium

In a game, where players in a one shot interaction would like to screw each other over, repeated for many time periods, we can sustain cooperative behaviour without having to force the players to follow some moral code. The “moral behaviour” of cooperating occurs naturally if players value the future enough i.e. have δ>1/3.

This suggests that good behaviour can be enforced without any third party, without a court system, without governments and even without the presence of an omnipotent being that punishes bad behaviour. The further implication: the morals we have now are just codifications of these equilibrium phenomena. In other words, we have just written down these rules to be able to quickly teach our kids these lessons without having them to have to learn how to achieve higher payoffs in the long run. And if you think about it all we are teaching people is to realize that the discount factor is not equal to zero.

--

--