The best way to read Warren Buffett’s annual shareholder letters

Understanding the billionaire’s words and thoughts through Data Science.

fylim
The Startup

--

Keyword extraction is one of the most popular text mining techniques in the Natural Language Processing (NLP) field. The idea behind keyword extraction is to capture important words using Data Science automatically. The technique is very effective when we want to gain insights from a big chunk of text data quickly.

In this article, I will attempt to apply keyword extraction techniques on the stakeholder letters penned by Warren Buffett between 1977 to 2019. There are many keyword extraction techniques available, but we will focus on using three techniques: frequency analysis, RAKE, and POS-tagging on the letter texts.

Warren Buffett, aka the Oracle of Omaha, is infamously well-known for writing insightful annual letters that are widely anticipated by the shareholders of Berkshire Hathaway and the investment community. In this article, we aim to extract interesting insights from the letters without any heavy readings.

Disclaimer on dataset

I am using this PDF that has all letters from 1977 to 2019 compiled. Only bigrams (2-word) are used in this article for analysis. The data analysis is done in R…

--

--

fylim
The Startup

is passionate about transforming texts into data points #NaturalLanguageProcessing https://www.linkedin.com/in/feng-yueh-lim-bb5910106/