[Research] Benford’s Law 1.0

JWH
JWH
Sep 8, 2018 · 3 min read

Benford’s Law is an empirical frequency distribution of the first digit of real-life sets of data. The distribution follows a logarithmic distribution specified by:

This observation holds true and significant across a very wide spectrum of data types from medical records, sales entry to the number of pages in books.

Uses

Benford’s Law is often used in an audit setting to detect fraud or entry manipulation. The intuition is that when someone tampers with the numbers, whether with or without the intention, the frequency distribution is likely to deviate from Benford’s Law. For instance, randomly generating a certain set of numbers will result in a crude uniformly distribution of the first digit.

Larger the data, the stronger the convergence — I pulled out 3 different types of large data from completely different sources to do a handy investigation of this phenomenon.

Example I. S&P 500

The first example is a time series data of S&P 500 Index (unadjusted) from 1950/01/03 to 2018/09/07. Although there is a slight deviation from the benchmark (Benford’s Law), the pattern is observed.

fig 1. S&P 500 First Digit Distribution [1950/01/03–2018/09/07]

Example II. Admission Statistics

The second example is a cross-sectional admission statistics, and the data is obtained from universityofcalifornia.edu/infocenter/admissions-source-school. The frequency distribution was obtained based on the number of UC Berkeley applicants from high schools around the world in 2017.

fig 2. Number of applicants to UC Berkeley by High School (All country, state, territory). Source: UC Admissions

Example III. NBA Statistics

The third example is obtained from NBA.com/stats. I randomly selected four of the performance metrics for all NBA players aggregated over 2017–18 season. This example examines multiple dimensions/metrics over the same period.

fig 3. NBA Statistics (Points, Field Goal Attempted, 3-Point Made, Assists). Source: NBA.com

Remark

The above examples contain simple visualization of share of first digit occurrences. The deviation measures alone do not warrant the trend is statistically sound. However, it is quite obvious that, despite some deviations, the frequency of first digit integers follow a monotonically diminishing distribution. That much of naturally generated datasets have been confirmed to conform to this distribution, there is an ample room for further research in interpreting the cause of deviation and investigating whether the cause is recurring.

JWH

Written by

JWH

Econometrics. Quantitative Analysis. Portfolio Management

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade