# Benford’s Law

### What is Benford’s Law?

Most people, (if they have ever thought about it), assume that numbers (e.g. 1,000 or 57 or 999 or 23,486,171,840,111,538) are equally likely to start with a 1 as they are to start with a 9.

Benford’s Law is an observation that the frequency of leading digits in many real-life sets of numerical data is not evenly distributed. In fact it looks like this…

About 30% of the time, the leading digit is a 1. About 5% of the time it is a 9.

### How can I create a Benford’s Law Distributed Dataset?

I wanted to test this, so I generated a million numbers (between 1 & 10,000 as follows using Python :

`import random`
`#a list to store the generated random numbersnumber_set = []`
`#Generate 100,000 random numbersfor x in range(100000):   #pick numbers between 1 and 10,000   number_set.append(random.randint(1,10001))`

Now extract all the leading digits

`##A list to store the leading digitsfirst_digit_set = []`
`#a method to get the leading digitdef get_leading_digit(number):    #convert the number to a string    #take the first character    #convert back to an integer and return the value    return int(str(number)[:1])`
`for d in number_set:    first_digit_set.append(get_first_digit(d))`

Now show the results

`for i in list(range(1, 10)):    print("There are " + str(first_digit_set.count(i)) + " leading " + str(i) + "'s")`

The numbers are evenly distributed!!

One of 2 things has happened.

1: A genius mathematical defined a law that is wrong (hint: it’s not this one)

OR

2: I have done something wrong

A: It turns out that Python’s Standard Library’s random module generates numbers with an even distribution. Remember Benford’s Law is an observation that the frequency of leading digits in manyreal-lifesets of numerical data is not evenly distributed.

So…

### How can you generate data with a pre-defined distribution (using Python 3)?

How can you generate data with a Benford’s Law distribution?

Well, since Python 3.6 (I think) the random modulehas had a method called `random.choices`which allows you to specify weights andthe number of items to generate…

`from random import choices`
`#specify a list of values to generate occurrenced of`
`#these are the digits we was as leading digitspopulation = [1, 2, 3, 4, 5, 6, 7, 8, 9]`
`#Specify the weights #these are the Benford Law weights)weights = [0.301, 0.176, 0.124, 0.096, 0.079, 0.066, 0.057, 0.054, 0.047]`
`#generate sample first_digit set with Benford disctibution`
`#k = 10**6 generates 1 million values first_digits = choices(population, weights, k=10**6)`
`from collections import Counter`
`#use the standard library's counter module to show the resultCounter(first_digits).most_common()`

(1, 301193),
(2, 175999),
(3, 123747),
(4, 95958),
(5, 79342),
(6, 65449),
(7, 57246),
(8, 53951),
(9, 47115)

And there you go. A list of one million numbers displaying a Benford’s Law distribution. Let’s plot it on a chart to validate.

`import numpy as npimport matplotlib.pyplot as plt`
`#Genrate random datasetcount = []`
`for c in Counter(first_digits).most_common():    count.append(c[1])    #sets spaces to put company labvels intoy_pos = np.arange(len(population))`
`#set size of the whole chartplt.figure(figsize=(10, 10))`
`# Create namesplt.xticks(y_pos, population)`
`plt.ylabel('LEading Digit Count')plt.title('Digit') # Create bars and choose colorplt.bar(y_pos, count, color = 'pink') # Limits for the Y axisplt.ylim(0, int(max(count)*1.1)) plt.show()`
Like what you read? Give Alex Freeman a round of applause.

From a quick cheer to a standing ovation, clap to show how much you enjoyed this story.