Shuffling Expectations: Life Doesn’t Play by the Rules

Analyzing the risks of applying game-derived statistical ideas in the real world.

Nihal CJ Shah
6 min readMay 26, 2024
Generative AI?

Roulette, Blackjack, Poker — games that run rampant in casinos and college towns. Up and down the Vegas strip, they have taken over, even infiltrating their way into becoming household games. These games, governed by precise, specific rules, are a dream for statisticians. The art of card counting and the science of probability are treasured by gamblers and mathematicians alike. Household poker games become hives of these practices, with many having a running calculation of poker probabilities. And in the same way that statistics spills into the wagers and games, the games spill into statistics.

Sit in on any high school statistics class. Amidst the chattering students and possibly uninterested teachers, you might observe an interesting yet unsurprising phenomenon: games take precedence as examples. Whether it’s a coin flip or poker odds, dice games and games such as roulette are consistently used by teachers to explain probabilities and statistical averages. Not only are games closed environments, but they are also effective models of portraying how long-term probabilities stabilize over time.

The fact of the matter is casino games are perfect for demonstrating statistical techniques. In a fair game, probability and averages take over. This is indisputable, and part of the reason understanding probability and statistics is so important. Barring unfortunate gambling circumstances (apologies to my readers who have lost money on Roulette at Caesar’s Palace), all gamblers can rely on statistics to make judgments about their bets.

The Problem with Generalization

Statistics hold steady when it comes to gambling. Averages, Gaussians, and other statistical techniques? All fair game. This doesn’t always apply to real-world situations. Not only do models get more and more complicated, but the world is not a closed environment. Outside the sphere of gambling and games, outside the sphere of controlled systems, we receive more and more unpredictable results.

Note:
This is similar in nature to the “Three Body Problem” of Classical Mechanics. In a system of two bodies with gravitational pull, the calculation of the effective orbits of the two bodies has a simple solution that can be generalized. However, when the number of bodies increased to three, the system becomes extremely complex, with there being no general solution or formula to solve the issue.

The Ludic Fallacy

Nassim Nicholas Taleb introduced the term “Ludic Fallacy” in his book “The Black Swan,” which describes this erroneous transfer of game-based statistics to real-world situations. The philosophical book explores the widespread statistical error of overgeneralization and the adverse consequences that can arise from such mistakes.

Taleb argues that these generalizations extend beyond simplistic probabilities. They delve much deeper than the rudimentary middle-school probability rules, into the space of Gaussian Statistics.

To readers who aren’t familiar with the foundations of elementary statistics, a large majority of statistical “rules” are rooted in the laws of probability. How? They work only when based on fundamental assumptions. In this sense, statistical predictions and modeling also often fail because they assume the fundamental assumptions hold up in an uncontrolled environment. Take the example of Gaussian statistics. The very notion of the bell curve is heavy on assuming small skews and shorter tails. furthermore, the assumption of symmetry is crucial to Gaussian performance. Even if you look past the Gaussian, most stable distribution models have heavy tails, and many imply infinite variance.

The 2008 Financial Crisis

The most common example of the Ludic Fallacy coming into play was the 2008 Financial Crisis, a financial and statistical disaster of enormous proportions. The effects of this crisis can still be felt in the country to this day, all caused by the largest financial fallout since the Great Depression.

So, with all our statistical, mathematical, and probabilistic knowledge, how did some of the most intelligent, well-studied people on the planet fail to predict the disaster?

The Li Gaussian Copula

During this period, many quantitative analysts used algorithms like the Li-Gaussian Copula that allowed for predictions and accurate samplings (at least until the occurrence of the financial disaster). The Li-Gaussian Copula specifically was widely used in the pre-financial crisis period as a model for pricing and risk management of financial derivatives. It was particularly popular among quantitative analysts and traders who used it to model the dependence structure of different financial assets and to price complex derivatives such as credit default swaps and collateralized debt obligations.

The model was widely used because it was able to capture the complex dependence structure of financial assets, which was seen as a major advantage over other models that relied on simpler dependence structures. But it still failed to predict the crisis.

Li-Gaussian Explained

Given a random variable X (an event) with continuous marginals (the probability functions of each possibility of the event happening), the copula is the set of dependence of all the components of X.

How does it work? To understand how this formula processes, imagine you’re at a dinner with a group of friends. You notice that when one friend leaves, a few others tend to leave as well. Gaussian copulas assume that these events and the dependance of one event on another follows a clear pattern, like your two close friends leaving the party around the same time.

The Li-Gaussian comes in where your friend Li is really good at detecting those dependencies: when one friend sneezes, another will too. It’s almost as if he has a special formula that seems to work all of the time…

The Math Explanation (A note for the mathematically minded)

U = (F₁(X₁), F₂(X₂), …, Fₙ(Xₙ)) where F is the set of marginal cumulative distribution functions.

C(u₁, u₂, …, uₙ) = P(U₁ ≤ u₁, U₂ ≤ u₂, …, Uₙ ≤ uₙ) is the Probability Distribution of the Copula.

The Relation

When it comes to the Gaussian copula, we’re looking at the distribution of dependencies amongst the possible events. If you don’t understand the math or are simply too lazy to follow, this must all sound like a load of bupkiss. But let’s think back to the topic of gambling, specifically poker. The random variable X can refer to a specific hand a player has. In a simple poker game ignoring complex strategies and bluffing, the cards dealt have a dependency structure amongst themselves. The dependencies of different hands follow an eerily similar pattern to the dependencies of financial copulas! The only difference? While in a closed poker game, the Gaussian copulas can be used to calculate the dependence with absolute certainty, this simplicity does not apply in the real world.

Even in a topic as complex as financial analytics, the ludic fallacy holds. We cannot escape the fallibilities of falsely imbued Gaussian certainty.

The Error in Gaussian Copulas — An Extension

We can, to demonstrate the failure of the Li-Gaussian in 2008, perform a simplified analysis to demonstrate the difference in accuracy between the two techniques.

The code below demonstrates the true fall-off during the 2008 financial year. Performing the Gaussian Multivariate on data from the S&P500 and NASDAQ tickers shows us the general trends of the market through popular and stable representations.

import numpy as np
import pandas as pd
import yfinance as yf
from scipy.stats import norm, gaussian_kde
from sklearn.preprocessing import StandardScaler
from copulas.multivariate import GaussianMultivariate

# Download historical data for 2000-2009
start_date = '2000-01-01'
end_date = '2009-12-31'
tickers = ['^GSPC', '^IXIC'] # S&P 500 and NASDAQ Tickers

# Pull from the adjusted closing prices
data = yf.download(tickers, start=start_date, end=end_date)['Adj Close']

# Calculate returns
returns = data.pct_change().dropna()

# Split the data into a training set (2000-early 2008) and a test set (rest of 2008)
train_end_date = '2008-01-01'
train_returns = returns[returns.index <= train_end_date]
test_returns = returns[returns.index > train_end_date]

# Standardize the returns
scaler = StandardScaler()
standardized_train_returns = pd.DataFrame(scaler.fit_transform(train_returns), columns=train_returns.columns, index=train_returns.index)

# Fit a Gaussian copula to the training data
copula = GaussianMultivariate()
copula.fit(standardized_train_returns)

# Generate samples from the copula for the test period
samples = copula.sample(len(test_returns))

# Convert samples back to original scale
samples = pd.DataFrame(scaler.inverse_transform(samples), columns=test_returns.columns)

# Compare the actual returns with the calculated copula samples
print("Actual returns:")
print(test_returns.head())
print("\nCopula samples:")
print(samples.head())

--

--