Finding Fraud With Benford’s Law

Benford’s Law is named for the Depression-era physicist who discovered (or rediscovered) that the expected frequency of digits in large lists of numbers is not even. 

It turns out that far more numbers start with 1 than any other digit (about 30%), and far less with 9 (less than 5 percent), and that the numbers in between follow a logarithmic curve that can be calculated as:

some text

In this equation, d is the leading digit, and P (d ) is the probability:

Why is this equation useful in the world of fraud detection? 

Simple: numbers that are entirely made up (such as in some corporate financial data sets), tend to “lump up” and jump off of Benford's curve, as they did in the Enron fraud, as seen below.

some text

Benford’s law does not work for all data sets; it is especially problematic with small data sets, data sets that are sharply limited in their high-low values (such as the weight of a person), or data sets where there are fixed default or binary values. That said, it’s a surprisingly powerful tool to use when looking at other kinds of large and broad data sets.

Can you make a False Claims Act, SEC, CTFC, or IRS fraud case based solely on deviations from Benford’s Law?

No. In all cases, you would still have to plead your case with particularity, which is to say that you would have to show specific cases of fraud and back it up with event-level data that showed the “who, what, where, when and how” of the fraud scheme.

But can Benford’s Law provide a quick screen of a big data set and suggest there might be something amiss going on? 

Oh yes!