Can You Tell if a Dataset is Fake?
Benford’s Law is a powerful tool!
In the age of big data, the ability to tease out the key elements from millions of bits of information is essential. This is where the art of data science comes in. It’s a blend of math and creativity to boil down something into its key components. There are many aspects of math that come in handy here, the most obvious is statistics.
Another important ability of data science is to determine if the data are real or not. It is all too easy to create a set of fake data for nefarious purposes. Especially common is for purposes of financial or election fraud.
However, data scientists have a trick up their sleeves for this problem. They can rely on an especially strange theory known as Benford’s Law. This law was stated back in the 1930s when physicist Frank Benford noticed a weird pattern that showed up in a wide variety of datasets. I will note that while this law is named after Benford, it was actually first discovered by Simon Newcomb much earlier.
Even though Newcomb was the original discoverer, I will still refer to it as Benford’s Law since that is its common name.