How to Automate the Extraction and Organization of Stock Data: Yahoo Finance API

Kristerduster
Analytics Vidhya
Published in
5 min readSep 21, 2020

This past summer, my grandfather taught me the tips and tricks of investing in the stock market. We did detailed analyses of various companies, comparing profit margins, price to earnings, and other aspects of company fundamentals. After taking hours to dig through Yahoo Finance and compile stock info into spreadsheets, I began to wonder if there was a more efficient way to research companies. I decided to design code that would extract the data I needed so I could simply glance over the data table to decide if a company was profitable. Here’s how I did it:

My code takes a list of stock tickers and outputs a pandas data frame where the columns are stock attributes and the rows are the corresponding ticker data.

Getting Started

Installing Yahoo Finance API

Before we can run the code, however, we have to install the Yahoo Finance API in our console. To do so, type “pip install yahoo_fin” and hit enter.

Importing Modules

Then we need to import the stock_info module from the yahoo_fin library so we can use its methods to extract stock data. Since we are using a pandas data frame to organize the data, we will also import pandas.

Extracting Stock Data

Now, we can start extracting data. To make our code clean, we will first define method functions that will organize data from individual methods into separate data frames. At the end, we will define a single function that will call the method functions, combine the data frames into one big data frame, and calculate new attribute columns based on the existing ones.

Using Methods and Organizing Data Frames

Our first function takes one string parameter: ticker. We will then call get_stats_valuation and assign the stats (a dataframe) to the variable df. To make our data frame easier to understand, we will rename the “Unamed: 0” column to “Attribute” and set inplace=True to make sure we are modifying df, not an alias of df. Then, we will make the “Attribute” column the first column and switch the rows and columns of df. Next, we’ll add a “Ticker” column to show which ticker the data belongs to and make it the first column. To show only the data we need, we will extract only the first row of data (shows most recent date) using index location (iloc) and get rid of unnecessary attributes using .drop(). Finally, we return our organized data frame df.

Additional Data

The next two functions are similar. For the second function, we’ll call the get_stats method and organize the data from the outputted data frame into a separate frame. In the third function, we’ll call the get_cash_flow method and do the same.

Additional Code

Some of the data in our frames are abbreviated numbers such as 1.1 M, and mathematical calculations can’t be performed on strings. We need to define a function that can convert said strings into numbers so we can have attributes such as debt/income or free cash/revenue. To do so, we’ll split the string by the letter, creating a list with the remaining string number and an empty object, which we’ll then join and convert into a float. We’ll multiply it by its corresponding value (1.1M becomes 1,100,000) and return the number.

Compiling Data Frames

Finally, we’ll define a function that combines the data from all three methods and calculates new attributes based on existing columns. We will create an empty data frame called bigDF to be our accumulator variable and will concatenate the individual dataframes for each ticker into bigDF. Next, we’ll create a new column in bigDF called “Spendings on Expenditures” and set it equal to the data under the “Operating Cash Flow (ttm)” column minus the data under the “Levered Free Cash Flow (ttm)” divided by the “Operating Cash Flow (ttm)”, using the convert to numbers function we defined earlier. We’ll do the same for the next three new attributes and return the finished data frame. Now, all we have to do is call the function!

Improvement Opportunities

Of course, my code isn’t perfect and there’s room for improvement or customization.

Yahoo_fin Limitations

Although the yahoo_fin library is expansive, it comes with a few limitations. For certain tickers, some attributes are empty, giving the code an error. For other tickers, there isn’t any data at all! If you wanted to be meticulous about the corner cases, you might check first if any data exists under a specific ticker before extracting anything. Then, write a function that outputs “no data” if there is no data for an attribute that is being used for calculations to prevent errors.

Customization

You may not want the same ticker data as I did. If so, I recommend printing each method to see which attributes are available, then extract the specific data you want using .iloc and .drop(). In addition, you can add or remove calculations depending on what additional attributes you’d like. Otherwise, enjoy the luxury of having all the stock data you need at a click of a button!

--

--