Visualizing Newcomb-Benford law

Thawfeek Varusai
4 min readJun 10, 2020

--

Revolutionary discoveries almost always have a dramatic anecdote associated with them. Newcomb-Benford law is no exception.

The ‘apple fall’ moment of maths

Some experts consider this law as significant as Newton’s law of gravity for mathematics. The story starts in 1881 when an American astronomer Simon Newcomb observed a strange phenomenon. The first few pages of the logarithm tables (lower digits) had more wear and tear than the rest (higher digits) meaning his fellow scientists were using these pages more than the rest. This was odd because if numbers were evenly distributed there must be even wear and tear along all the table pages. Newcomb derived the frequency of the first two digits in the log table (Table 1). The first digits of numbers followed a power law with lower digits having the higher frequencies than and higher digits.

Power law P(d1) = log(1+1/d1)

Table 1

Newcomb published this astounding observation in a mere two-page long paper. However, this paper did not get the attention it deserved. It would take another half a century for the world to appreciate this in the form of Frank Benford. A physicist at General Electric, Benford somehow rediscovers Newcomb’s work and reproduces the results in a more robust way. His master stroke was to show that the law works for any random data and not necessarily scientific deterministic numbers. Probably this was a key factor in gaining the attention of the scientific community. He rose to fame and the law is popularly known as Benford law ignoring the founder. I told you this is a dramatic tale! Ironically, an elementary principle of mathematical sciences has come out from the work of two physicists.

Natural pattern

The frequency distribution of the Newcomb-Benford law for the first two digits are shown in table 1. As the digit position progresses the distribution is evened out and we start observing a uniform frequency. For more details on the maths of the law, there are some great articles online (https://wwwf.imperial.ac.uk/~nadams/classificationgroup/Benfords-Law.pdf and https://auditware.co.uk/wp-content/uploads/Guide_to_Benfords_Law.pdf).

Common distribution types

In this article, let’s focus on how the Newcomb-Benford law works on different types of distributions. I consider three continuous distributions here — normal, exponential and uniform. Normal distributions are the most common ones representing most behaviours in the universe (hence the name) with the typical bell-shaped curve. Exponential distributions apply to independent events that occur at an average rate like decay of radioactive materials or time between phone calls. In uniform distributions all the events have equal probabilities of success such as a fair die. More details on distributions can be found here (https://www.analyticsvidhya.com/blog/2017/09/6-probability-distributions-data-science/).

Newcomb-Benford law on different distributions

If you’re like me, then you don’t understand what you can’t see. So, let’s visualize the math. Figure 1 shows the different distributions generated using random values on corresponding functions (blue). Logarithmic transformations (base 10) are shown in red. We observe that irrespective of the distribution type the log transformation curves look similar.

The figure also shows how Newcomb-Benford law applies to each of the distributions and we observe striking similarity regardless of the distribution type.

Figure 1

Nevertheless, let’s try to quantify the dissimilarity between the expected and observe values for each distribution. I calculate the chi-square (X2) value between expected and observed data which will tell us how far apart the two values are. To make sure that there is no effect of sample size, I progressively calculate the X2 values for 10 million random data in each distribution. I find that normal distributions fit the Newcomb-Benford law best with very low X2 values. Second best is the uniform distribution and exponential distribution has relatively poor match (Figure 2). This observation encourages the use of the Newcomb-Benford law for everyday random data at our disposal that follow normal distributions.

Figure 2

Effect of data fabrication

We next ask a very important question that can have powerful practical implications — what happens to the Newcomb-Benford law when data is manipulated? I tested this for different distributions with various types of manipulation.

In silico experiments show that normal distributions are sensitive to data manipulation across a wide spectrum (Figure 3). Numbers are varied proportionally with bar graphs representing incremental deviation from original value. For instance, the first bar indicates a 10% random change whereas the last bar indicates a 100% random change. Small and large changes to data are detected by normal distributions. In my analysis, exponential distribution seems to be sensitive to additions and subtractions to data values and uniform distribution to subtractions.

Figure 3

I hope this article gave you a brief graphical tour of various aspects of the Newcomb-Benford law. We explored the behaviour of normal, exponential and uniform distribution to the law and the effects of data manipulation on this. Python codes used to run these simulation can be found here: https://github.com/vthawfeek/NewcombBenford_Law/blob/master/NewcombBenford_Law.ipynb.

Like this post?

If you enjoyed this read, you might also be interested in similar topics at my website: https://rokpayprsizors.wordpress.com/

--

--

Thawfeek Varusai

I’m a life science enthusiast with applied mathematics skills. I’ve a PhD in Systems Biology and currently work as a data analyst in a bioinformatics company.