Did Ergodicity Economics and the Copenhagen Experiment Really Falsify Expected Utility Theory?

Adam Goldstein
8 min readJan 29, 2020

--

Around six months ago, Oliver Hulme and his co-authors published an interesting paper on arXiv. Hulme summarized their work, which came to be known as the “Copenhagen experiment”, in this Twitter thread. The experiment supposedly vindicated Ole Peters’ and the LML group’s research into what they consider a new branch of economics based on a statistical physics concept known as “ergodicity”. Nassim Taleb went so far as to post on Twitter, “ Yuuge vindication for Ole Peters and people who doubt psycholophasters of ‘utility’”.

Peters recently published an influential article about Ergodicity Economics (EE)in Nature Physics, where he described the goals of the Copenhagen experiment like this:

“A positive result — people changing behaviour in response to the dynamics — would corroborate ergodicity economics and falsify expected utility theory (insofar as experiments falsify models)… Expected utility theory predicts that people are insensitive to changes in the dynamics. People may have wildly different utility functions, which would be reflected in wildly different best-fit values of η, but the dynamic setting should make no difference. Utility functions are supposedly psychological or even neurological properties. They indicate personality types — risk seekers and scaredy cats.”

(Note: this Medium article assumes the reader is already familiar with EE and the Copenhagen experiment. Please check out some of the links provided above if you want more background.)

I’ll show in what follows that Peters’ claims about Expected Utility Theory (EUT), and what it predicts in dynamic settings like the Copenhagen experiment, are simply not true.

Let’s first consider the single-period gamble shown in Figure 1 below. The person starts off with $4 and is offered the choice of either accepting or rejecting a single gamble that pays off $5 (win) with probability p and $3 (loss) with probability 1-p. We’ll assume for now that the person’s utility function can be modeled as logarithmic.

Figure 1: Single-period gamble

Since this is a single-period gamble, there’s no distinction between “multiplicative” and “additive” dynamics yet. The gamble can either be viewed as [+1 with probability p, -1 with probability (1-p)] or [+25% with probability p,-25% with probability (1-p)].

Question: for what value of p is the person indifferent between accepting and rejecting this gamble? To figure this out, we need to solve for p such that the expected utility of the gamble payoff is equal to the utility of initial wealth: p*Ln(5) + (1-p)*Ln(3) = Ln(4). The solution is p = 0.5632.

Once we expand the game to two or more periods, the distinction between multiplicative and additive dynamics starts to matter. Figure 2 shows the two-period additive dynamics case.

Figure 2: Additive dynamics, two periods

In this scenario, there are potentially two gamble choices the person is asked to make. If the first gamble is accepted from known initial wealth $4, then he/she will be asked to make a second choice based on the outcome of the first gamble; if the first gamble is rejected then the game is over and there’s no second choice to make.

So what’s the optimal strategy for a two-period game like this according to EUT? You’re supposed to maximize expected utility of final wealth, taking into account both stages of the game tree shown in Figure 2. The standard technique to solve such problems is called “dynamic programming”. It was invented by Richard Bellman in the 1950’s and has been actively applied to economics and finance ever since Paul Samuelson and Jan Mossin first wrote about it in the late 1960's.

The basic idea behind dynamic programming is to start at the end and work backwards (“backward recursion” or “backward induction”). For a specific example, let’s use the win probability p=0.5632 derived above for the single-period case. Since p=0.5632 causes indifference between accept/reject when w=4 in a single-period scenario, clearly the same additive gamble will be accepted in the second period at higher wealth w=5 (shown in green, first outcome was win) and rejected at lower wealth w=3 (shown in red, first outcome was loss). We can therefore compute expected utilities in the second period as follows: U1(5)=p*Ln(6)+(1-p)*Ln(4) = 1.6147 at w=5, and U1(3)=Ln(3) = 1.0986 at w=3.

These values U1(5) and U1(3) are known as “derived” utility values. Note that derived utility U1(5)=1.6147 is greater than final utility U(5)=Ln(5) = 1.6094 at the same wealth level w=5. The accept/reject choice in the first period can now be made using the derived utilities calculated for the second period: accept if p*U1(5) + (1-p)*U1(3) > U(4), reject otherwise. The first gamble is accepted since 1.3893>1.3863.

Notice something interesting here: the person was indifferent between accept and reject from initial state w=4 in the single-period case, but they’re no longer indifferent for the same gamble offered in a two-period context. In other words, the presence of a second period with additive dynamics causes the person to become less risk averse in the first period compared to the single-period case.

It’s also instructive to derive the two-period indifference probability, where we solve for the value of p that causes the person to be indifferent between accept and reject in the first period at w=4. Using the previously derived equations for U1(5) and U1(3), we can solve for p such that p*U1(5) + (1-p)*U1(3) = U(4). Substitution yields the quadratic equation p*[p*Ln(6) + (1-p)*Ln(4)] + (1-p)*Ln(3) = Ln(4) and one positive solution, p=0.5592.

We can now quantify how much less risk averse the person has become in the two-period additive game using what I’ll call the “derived risk aversion parameter”, in analogy with the “derived utility” described above. Log utility is a special case of a more general class of utility functions known as Constant Relative Risk Aversion (CRRA): CRRA(w,eta) = (w^(1-eta)-1)/(1-eta), where eta is the risk aversion parameter. When eta=1, l’hopital’s rule can be used to show that CRRA(w,1)=Ln(w); when eta=0, CRRA(w,0)=w-1 which is linear utility.

“Derived eta” is found by solving for eta such that p*CRRA(w0+1,eta) + (1-p)*CRRA(w0–1,eta)=CRRA(w0,eta) using indifference probability p=0.5592 as calculated above. The solution in this case is eta=0.937, which means that the person’s first-period risk aversion parameter has decreased from eta=1 to eta=0.937 due to the inclusion of the second period with additive dynamics.

Let’s now go through this same analysis for the two-period multiplicative case shown in Figure 3 below.

Figure 3: Multiplicative dynamics, two periods

This two-period game can be solved with dynamic programming just like the additive one above, but it turns out that’s unnecessary in this case. As first derived by Mossin in 1968, when the utility function is CRRA(w,eta) and the multi-period dynamics are multiplicative, the optimal multi-period EUT decision rule is “myopic”. That means we can analyze the multi-period case just like the single-period one; i.e. maximizing expected utility of final-period wealth is equivalent to maximizing expected utility of next-period wealth.

We can confirm that’s the case by plugging the single-period indifference probability p = 0.5632 calculated earlier into the two expected utility formulas for the two second-period states: U1(5) = p*Ln(6.25) + (1-p)*Ln(3.75) = 1.6095 = Ln(5), and U1(3) = p*Ln(3.75) + (1-p)*Ln(2.25) = 1.0986 = Ln(3). As you can see, the same probability p = 0.5632 that causes accept/reject indifference in the first period also causes accept/reject indifference in the two second-period states, so we’ve confirmed that that the derived utility function stays constant from one period to the next for the case of CRRA utility and multiplicative dynamics.

Thus, unlike the case of additive dynamics where the person’s first-period “derived” risk aversion parameter decreased from eta=1 to eta=0.937 due to the inclusion of the second period, when we use multiplicative dynamics the person’s first-period risk aversion parameter remains eta=1 no matter how many periods there are.

The methodology described above can be extended to any number of periods N, so we can now investigate how much further the first-period risk aversion parameter declines for additive dynamics as N becomes large. One interesting issue that crops up when N≥w0 is that the game must end (current and all future gambles are rejected) if wealth ever hits the lower limit w=1. That’s because U(0) = Ln(0) = -inf. The top end of the binomial tree, however, can grow without bound as N ->inf.

I wrote some Matlab code to implement the dynamic programming algorithm. Here’s a plot showing “derived eta” as N varies between 1 and 100. The parameters correspond to the additive dynamics analyzed above, where D=1 corresponds to the +1 (win) / -1 (loss) gamble and W0=4 represents initial wealth. For N=100 the first-round derived eta is 0.578.

Interestingly, first-round derived eta becomes even lower (0.484) when w0 is reduced from 4 to 2 as shown below.

Conclusion

Hopefully it’s now clear that Peters’ claim that “Expected utility theory predicts that people are insensitive to changes in the dynamics” is incorrect. EUT predicts that people become significantly less risk averse when the dynamics change from multiplicative to additive, particularly in the early periods. Therefore, EUT has certainly not been “falsified” by the Copenhagen experiment.

It might be pointed out that the Copenhagen experiment appears to show an even more dramatic decrease in risk aversion than the results shown above. However, there are some issues with that experiment that need to be pointed out. One key point is that gamble outcomes are not revealed to the test subjects until the end, and to further complicate matters, only 10 out of 300 outcomes are randomly sampled to generate the final payment to the subjects. Also, resampling is done if the partial sum becomes negative or exceeds an upper threshold. What effect these complications have on the optimal strategy is unclear at this point.

I’ve been told that Hulme’s group is planning on running some new tests with an updated design, and I hope to be involved in the planning of the new experiments. I do think his and his co-authors’ work is quite interesting, and I must say that Oliver has been very open to constructive criticism and feedback on his experiment, so I thank him for that. Finally, I’d like to thank Ilari Lehti (@IlariLehti) for some useful discussions on this topic.

--

--

Adam Goldstein

Former integrated circuit designer who’s now a full-time value investor and quant finance practitioner. BS, MS and PhD all in Electrical Engineering.