[Research] Benford’s Law 1.0
Benford’s Law is an empirical frequency distribution of the first digit of real-life sets of data. The distribution follows a logarithmic distribution specified by:

This observation holds true and significant across a very wide spectrum of data types from medical records, sales entry to the number of pages in books.
Uses
Benford’s Law is often used in an audit setting to detect fraud or entry manipulation. The intuition is that when someone tampers with the numbers, whether with or without the intention, the frequency distribution is likely to deviate from Benford’s Law. For instance, randomly generating a certain set of numbers will result in a crude uniformly distribution of the first digit.
Larger the data, the stronger the convergence — I pulled out 3 different types of large data from completely different sources to do a handy investigation of this phenomenon.
Example I. S&P 500
The first example is a time series data of S&P 500 Index (unadjusted) from 1950/01/03 to 2018/09/07. Although there is a slight deviation from the benchmark (Benford’s Law), the pattern is observed.


Example II. Admission Statistics
The second example is a cross-sectional admission statistics, and the data is obtained from universityofcalifornia.edu/infocenter/admissions-source-school. The frequency distribution was obtained based on the number of UC Berkeley applicants from high schools around the world in 2017.


Example III. NBA Statistics
The third example is obtained from NBA.com/stats. I randomly selected four of the performance metrics for all NBA players aggregated over 2017–18 season. This example examines multiple dimensions/metrics over the same period.


Remark
The above examples contain simple visualization of share of first digit occurrences. The deviation measures alone do not warrant the trend is statistically sound. However, it is quite obvious that, despite some deviations, the frequency of first digit integers follow a monotonically diminishing distribution. That much of naturally generated datasets have been confirmed to conform to this distribution, there is an ample room for further research in interpreting the cause of deviation and investigating whether the cause is recurring.