Analyze Enron’s Accounting Scandal With Natural Language Processing

Detecting fraud from the text of Enron’s earnings call

The Startup


Photo by Joshua Hoehne on Unsplash


Natural Language Processing (NLP) has been gaining tractions in recent years, allowing us to understand unstructured text data in a way that was never possible before. One of the promises of NLP is to use relevant techniques to detect fraud in companies and shed light on potential violations in the early phase.

About the dataset

I’ve only managed to find two earnings call transcripts online. And only one of
them is readable when converted from PDF to text. You can find the original
document here.

The earnings call transcript used in this article is from Enron’s conference call hold on November 14, 2001. Enron filed for bankruptcy on December 2, 2001.

Pre-processing the dataset

As you can see from the original Earnings, call PDF document, the document
is not digital and contains numbers in between the conversations.



The Startup

is passionate about transforming texts into data points #NaturalLanguageProcessing