# Benford’s Law

### What is Benford’s Law?

Most people, (if they have ever thought about it), assume that numbers (e.g. 1,000 or 57 or 999 or 23,486,171,840,111,538) are equally likely to start with a 1 as they are to start with a 9.

Benford’s Law is an observation that the frequency of leading digits in many real-life sets of numerical data is not evenly distributed. In fact it looks like this…

About 30% of the time, the leading digit is a 1. About 5% of the time it is a 9.

### How can I create a Benford’s Law Distributed Dataset?

I wanted to test this, so I generated a million numbers (between 1 & 10,000 as follows using Python :

import random

#a list to store the generated random numbers

number_set = []

#Generate 100,000 random numbers

for x in range(100000):

#pick numbers between 1 and 10,000

number_set.append(random.randint(1,10001))

Now extract all the leading digits

##A list to store the leading digits

first_digit_set = []

#a method to get the leading digit

def get_leading_digit(number):

#convert the number to a string

#take the first character

#convert back to an integer and return the value

return int(str(number)[:1])

for d in number_set:

first_digit_set.append(get_first_digit(d))

Now show the results

for i in list(range(1, 10)):

print("There are " + str(first_digit_set.count(i)) + " leading " + str(i) + "'s")

There are 33513 leading 1's

There are 33181 leading 2's

There are 33140 leading 3's

There are 33707 leading 4's

There are 33461 leading 5's

There are 33133 leading 6's

There are 33286 leading 7's

There are 33419 leading 8's

There are 33170 leading 9's

The numbers are evenly distributed!!

One of 2 things has happened.

1: A genius mathematical defined a law that is wrong (hint: it’s not this one)

OR

2: I have done something wrong

A: It turns out that Python’s Standard Library’s random module generates numbers with an even distribution. Remember Benford’s Law is an observation that the frequency of leading digits in many** real-life**sets of numerical data is not evenly distributed.

So…

### How can you generate data with a pre-defined distribution (using Python 3)?

How can you generate data with a Benford’s Law distribution?

Well, since Python 3.6 (I think) the random modulehas had a method called `random.`

which allows you to specify weights **choices***and*the number of items to generate…

from random import choices

#specify a list of values to generate occurrenced of

#these are the digits we was as leading digits

population = [1, 2, 3, 4, 5, 6, 7, 8, 9]

#Specify the weights

#these are the Benford Law weights)

weights = [0.301, 0.176, 0.124, 0.096, 0.079, 0.066, 0.057, 0.054, 0.047]

#generate sample first_digit set with Benford disctibution

#k = 10**6 generates 1 million values

first_digits = choices(population, weights, k=10**6)

from collections import Counter

#use the standard library's counter module to show the result

Counter(first_digits).most_common()

(1, 301193),

(2, 175999),

(3, 123747),

(4, 95958),

(5, 79342),

(6, 65449),

(7, 57246),

(8, 53951),

(9, 47115)

And there you go. A list of one million numbers displaying a Benford’s Law distribution. Let’s plot it on a chart to validate.

import numpy as np

import matplotlib.pyplot as plt

#Genrate random dataset

count = []

for c in Counter(first_digits).most_common():

count.append(c[1])

#sets spaces to put company labvels into

y_pos = np.arange(len(population))

#set size of the whole chart

plt.figure(figsize=(10, 10))

# Create names

plt.xticks(y_pos, population)

plt.ylabel('LEading Digit Count')

plt.title('Digit')

# Create bars and choose color

plt.bar(y_pos, count, color = 'pink')

# Limits for the Y axis

plt.ylim(0, int(max(count)*1.1))

plt.show()