Retrieving historical stock data for analysis can be somewhat of a task. Many APIs that provide this information require some type of membership, account, or even fee before you have access to the data. Fortunately, Yahoo Finance offers the information free of charge on their site with the ability to download historic stock data. However, using the site to do this can become a hassle when you want to retrieve new data on the fly or when needing data for many different companies, EFTs, Index Funds, etc. In this post, I will show you how to use Python to create a class that can retrieve data for any ticker that Yahoo has data for. This can be done easily and on the fly for any number of tickers. In subsequent posts, I will be using this “API wrapper” for projects pertaining to stock price/time series simulation and analysis.
I’m not familiar with any official APIs offered by Yahoo Finance but many people take advantage of the fact that historic data can be directly downloaded from the site. When downloading this data you’re essentially “navigating” to a different URL but rather than leaving the current page this URL will just download a file. We can take advantage of this using the Pandas library for Python.
The great thing about Pandas, besides almost everything, is the ability to pass into the data frame constructor anything that will result in a CSV file, this includes URLs that download CSVs. This is something taken advantage of in the following API.
We define a class to be used in other Python projects for the API and include a few libraries:
As seen above this API is going to be very simple and straightforward; its only intent is to retrieve historic stock data. The class’ init function takes one optional parameter to specify the interval in which we want the data returned. Valid options are 1d (one day), 1wk (one week), and 1m (1 month), by default we’ll just pull daily data.
Now we’ll fill in the functions:
The setup is simple: define the base URL and store the desired interval.
The private method __build_url(…) takes the ticker symbol of the stock we’re interested in retrieving data for, a start date, and an end date (correctly formatted) and builds the URL that can be used to get the stock data.
The get_ticker_data(…) function is the access point to the API from the developer wanting the stock data. This function takes the ticker symbol and start and end dates as Python datetime objects. The start and end dates are transformed into the correct format for Yahoo (timestamps representing time since the Unix epoch). A Pandas data frame containing all of the historic stock data is returned from this function call. As seen here, the data frame creation is easily done by passing the URL for the CSV file into the Pandas data frame constructor.
Finally, the above snippet of code is used to test the API. For those unfamiliar, the code after the if statement if __name__ == ‘__main__’: in Python essentially only executes if this source code is being used as the main entry point to the program. That is, if I run this file standalone the logic will execute, otherwise it will not (e.g. if the code in this file is imported into another file). For a quick-and-dirty test, I’ve used this if statement to verify I am able to able to fetch data for Microsoft stock (MSFT). Running this logic produces the following results:
As seen above retrieving data from Yahoo Finance is very straightforward in Python. In under 20 lines of code we’ve developed the ability to get daily, weekly, or monthly data for any ticker symbol listed on Yahoo Finance. This data can be used for a plethora of applications including stock data analysis, training and testing machine learning algorithms, and developing stock trading bots. In future posts, I will use this API to build up portfolios of securities to work with some ideas in modern portfolio theory and stochastic process modeling.
Originally published at https://www.anthonymorast.com on June 28, 2020.