Is Random Truly Random?

Serendipity or Structure

M.Hamxa
7 min readJan 22, 2024
Image generated by author using Leonardo.AI

Random numbers — as the name suggests — seem trivial enough not to be cared about. Of what good use they could be when we already have carefully-crafted distributions that produce striking symmetries.

Every time you send an email or access your bank account, you rely on random numbers to protect your online activity. They’re used to generate encryption keys, making unauthorised access incredibly difficult.

Weather patterns, financial markets, and drug interactions can also be studied by creating computer simulations that rely on random numbers to mimic real-world conditions.

However, the quest for ‘genuine randomness’ is not an easy pursuit. Scientists and mathematicians have long struggled to create truly random numbers.

In 1955, the Rand Corporation published a million random digits from an electronic circuit that supposedly created unpredictable numbers. But they displayed subtle patterns — not purely random.

John Von Neumann, mathematician and physicist, also tried luck with a random number algorithm, but he knew the numbers from his algorithm were not really random: they repeated after many iterations, and the same seed always created the same numbers.

He later suggested that a deterministic process, such as a computer program, can never produce an indeterminate outcome.

After much refinement till date, random numbers used today are called ‘pseudo-random’ numbers.

Some popular PRNGs are

Lagged Fibonacci & friends:

Lagged Fibonacci and related generators improve the properties of the random numbers by using more than one integer as the internal state. Generally, lagged Fibonacci generators form a new integer Xi using

where p and q are lags, integer constants, p < q, and ⊙ is some arithmetic operation, such as +, -, *

Strengths:

  • Can have very long periods, making them less predictable.
  • Suitable for applications where speed is important.

Weaknesses:

  • Sensitive to the choice of initial seed values, which can affect the quality of the generated sequence.
  • May exhibit correlations which can be a problem for certain applications.
  • Not considered cryptographically secure.

Linear Congruential Generator:

It yields a sequence of pseudo-randomised numbers calculated with a discontinuous piece wise linear equation.

Strengths:

  • Straightforward to implement.

Weaknesses:

  • Often have shorter periods than other random number generators, making them more predictable.
  • Not considered cryptographically secure due to predictability issues.

Mersenne Twister:

Mersenne Twister was developed by professors Makoto Matsumoto and Takuji Nishimura of Hiroshima University almost twenty years ago.

MT19937 is a very powerful generator and is used by every widely distributed mathematical software package including Python.

You can use functions like np.random.random, np.random.randint, np.random.normal, and many others to generate random numbers from different distributions in Python.

Encryption models also use pseudo random generators but they employ more complicated systems. One such encryption model is RSA-2048 encryption, commonly used for online transactions.

RSA requires the generation of two large prime numbers, typically hundreds of digits long, to create its public and private keys. It relies on the mathematical difficulty of factoring large prime numbers.

For current classical computers, factoring a 2048-bit key used in RSA is impractical, requiring billions of years even with the most powerful machines.

RSA’s security doesn’t stem from purely random techniques that are inherently unpredictable; it is based on the mathematical difficulty of solving a specific problem — factoring large prime numbers.

Quantum computers pose a potential future threat to RSA; certain algorithms like Shor’s algorithm can factor large numbers significantly faster than classical computers.

Just as we wrap our heads around Pseudo random numbers, another problem arises.

The Central Limit Theorem emerges to reveal a hidden order within their apparent disorder. Even for independent sequences of numbers, their averages tend to cluster around a predictable bell-shaped curve — the normal distribution — as the sample size grows large enough.

D Wells, CC BY-SA 4.0

The Central Limit Theorem (CLT) applies not just to random numbers but also to specific distributions like Bernoulli’s distribution.

Let’s generate a large population of random numbers in python and take several samples with their averages.

import numpy as np
import matplotlib.pyplot as plt

# Define a function to generate a large population of random numbers
def generate_population(size):
return np.random.rand(size)

# Generate a large population of random numbers
population = generate_population(100000)

# Take many samples and calculate their means
num_samples = 500
sample_size = 100
sample_means = []
for i in range(num_samples):
sample = np.random.choice(population, size=sample_size)
sample_mean = np.mean(sample)
sample_means.append(sample_mean)

# Plot the distribution of sample means
plt.hist(sample_means, bins=20, density=True)

# Plot the normal distribution with the calculated mean and standard deviation
mean = np.mean(sample_means)
std = np.std(sample_means)
x = np.linspace(mean - 3*std, mean + 3*std, 100)
y = np.exp(-(x - mean)**2 / (2*std**2)) / np.sqrt(2*np.pi*std**2)
plt.plot(x, y, label="Normal distribution")

# Add labels and title
plt.xlabel("Sample mean")
plt.ylabel("Density")
plt.title("Distribution of sample means from a large population")
plt.legend()

# Show the plot
plt.grid(True)
plt.show()

The samples means tend to fit a bell curve.

Let us consider another example in which we take Bernoulli distribution— with sufficiently large parameters — and take means of samples from it.

import numpy as np
import matplotlib.pyplot as plt

# Parameters for Bernoulli distribution
n = 1000 # Number of trials
p = 0.5 # Probability of success

# Generate samples from Bernoulli distribution
bernoulli_samples = np.random.binomial(n, p, size=10000)

# Calculate mean and variance for normalization
mean = n * p
variance = n * p * (1 - p)

# Normalize Bernoulli samples to approximate Normal distribution
normalized_samples = (bernoulli_samples - mean) / np.sqrt(variance)

# Plot the histogram of normalized samples
plt.hist(normalized_samples, bins=50, density=True)

# Plot the Normal distribution with the same mean and variance
plt.plot(np.linspace(-4, 4, 100),
np.exp(-0.5 * np.linspace(-4, 4, 100)**2) / np.sqrt(2 * np.pi),
label="Normal Distribution")

plt.xlabel("Value")
plt.ylabel("Density")
plt.title("Approximating Normal Distribution with Bernoulli")
plt.legend()
plt.show()

Not surprisingly, it also tends to follow a normal distribution.

Why it can be a problem?

It affects the efficiency of Monte Carlo experiments.

Monte Carlo methods are computational algorithms that rely on repeated random sampling to obtain numerical results; the underlying concept is to use randomness to solve problems.

Monte Carlo methods are used in Simulating particle interactions, material properties, fluid dynamics, designing drug candidates, protein folding and understanding chemical reactions.

For Monte Carlo to work we have three criteria’s

  1. Large number of random data points (N)
  2. Uncorrelated data

A fun fact: one set of parameters that was used heavily in the 70’s and early 80’s is “Randu”: It was regarded as a random number generator, but later it was found not to be random. This casted doubt on the Monte Carlo simulations that were conducted with those data sets.

The error in Monte Carlo is

Error in Monte Carlo experiments

This is due to Central Limit Theorem.

Variance reduction method and Bayesian techniques can be employed to reduce the error — other than increasing the number of data points.

Monte Carlo’s dependence on random data points can be best understood by an example of calculating indefinite integral.

For the determination of the area of two-dimensional shape enclosed within a larger, regular shape, a Monte Carlo integration approach can be employed.

Diagram created by the author

A large number of random points (N) are sprinkled on the area. The area of irregular shape then is

(number of points inside irregular shape / N ) * area of known shape

Definite integrals are analogous to this. They are just the area under curve.

I have calculated

using Monte Carlo technique.

I enclosed our required curve inside a rectangle that ranges from (-1,2) in y-axis and (-4,4) in x-axis.

diagram created by author
#Using Monte Carlo Simulation
import numpy as np
N=1000000
count=0
for i in range(0,N):
points=(np.random.uniform(-1,2),np.random.uniform(-4,4))
if 0<= points[0] <= np.pi and 0< points[1] <=1:
count=count+1

area= (count/N)*24
print(area)

The answer you get is 1.9990560000000002 which is quite close to 2 ( the exact answer).

We took 1000000 data points to get this satisfactory result.

Why use Monte Carlo instead of other numerical methods?

Monte Carlo’s complexity is linear while numerical methods exhibit exponential complexity.

Let’s assume that an integral takes 10 function calls and has 12 dimensions. A numerical method would take the order of 10**12 function calls in 12 dimensions. The same integrals might take a mere 12 million in 12 dimensions.

Monte Carlo wins by a large margin in several problem-solving techniques.

USE OF QUANTUM RANDOM NUMBERS GENERATORS:

QRNGs harness the inherent randomness of quantum phenomena, like photon polarization or electron spin, resulting in truly unpredictable and non-repeating sequences.

Typically, QRNG leverages photonics. A photon encounters a beam splitter, resulting in a quantum superposition where it exists in states of being both transmitted and reflected. Subsequently, one of the detectors randomly detects the photon, generating a random bit.

Convergence to a normal-like distribution might still occur , but the approach to normality might be slower and less well-defined.

The Central Limit Theorem relies on the assumption that the individual random variables being summed are independent. However, in quantum mechanics properties are correlated even when physically separated — entanglement.

QRNGs can surely provide us with a level of randomness unparalleled by classical methods, which makes them ideal for cryptography, Monte Carlo simulations, information security and other scenarios where predictability can be fatal.

--

--

M.Hamxa

I write on a variety of topics, ranging from computations to science narratives.