Free Historical Market Data Download in Python

Download free historical market data for stocks, commodity futures, foreign exchange, cryptocurrency, and intraday prices

Letian Wang
The Startup
Published in
6 min readSep 5, 2020

--

Historical market data is essential for financial analysis and strategy backtesting. Professional data vendors sometimes are not an economically viable option for retail investors or startups. Fortunately, with the development of financial technologies or FinTech and the movement of inclusive finance, there are choices of free market data sources available online.

In this post, I’m going to discuss three of them and demonstrate how to get data with Python. The three data sources and APIs discussed here are:

  1. Yahoo Finance
  2. Pandas DataReader
  3. Quandl
  4. Interactive Brokers (Supplemental)

The discussion is not limited to daily stock market data but also commodity futures, foreign exchange, and intraday. Also, this is about historical data download only. For live tick data recording, check out my video here.

1. Yahoo Finance

You can’t get around Yahoo Finance, one of the first practitioners of financial data democratization and equal-opportunity financial inclusion. It was temporarily unavailable in 2017 however some fix libraries were posted since then, one of them later became yfinance. Google Finance tried similar services but was not as popular.

Here is one tip about Yahoo Finance, which is that everything you see on their website can be potentially downloaded or real-time streamed and more likely than not someone has already done so.

I have to spend half of this post on Yahoo Finance so I’ll break it into four sub-sections. The full code can be found here on Github.

1.1 Daily Price

1.1.1 Stock and stock index

Daily price is what you see on the Historial Data tab. Note that you can save the data for later use, but due to backward price adjustment, the Adj Close is likely to change in the future.

#!pip install yfinance
#!pip install mplfinance
from datetime import datetime
import yfinance as yf
import mplfinance as mpf
start_date = datetime(2019, 1, 1)
end_date = datetime(2019, 12, 31)
data = yf.download('AAPL', start=start_date, end=end_date)
mpf.plot(data,type='candle',mav=(3,6,9),volume=True,show_nontrading=True)
Candlestick Daily AAPL | Data from Yahoo Finance

1.1.2 FX and Cryptocurrency

For example, use EURUSD=X for Euro or BTC-USD for Bitcoin.

data = yf.download('EURUSD=X', start=start_date, end=end_date)
data.head()
OpenHighLowCloseAdj CloseVolumeDate
2019-01-01 1.1494251.1550011.1465001.1493061.1493060
2019-01-02 1.1461321.1497001.1345721.1461711.1461710
2019-01-03 1.1317341.1409141.1317341.1318111.1318110
2019-01-04 1.1390951.1417741.1348161.1391081.1391080
2019-01-07 1.1412921.1474471.1405241.1410441.1410440

1.2 Fundamentals

For instance, Amazon page on Yahoo Finance, there are other tabs besides Historical Data, such as Summary, Statistics, Profile, which are all downloadable using Python tools such as BeautifulSoup, or more conveniently, yahoo-fin.

AMZN on Yahoo Finance | Snapshot from Yahoo Finance Webpage
ticker = yf.Ticker('AMZN')corp_info = {k: v for k, v in ticker.info.items() if k in ['sector', 'industry', 'fullTimeEmployees', 'city', 'state', 'country', 'exchange', 'shortName', 'longName']}df_info = pd.DataFrame.from_dict(corp_info, orient='index', columns=['AAPL'])df_info
AMZN Profile | Data from Yahoo Finance
from yahoo_fin import stock_infodf_balance_sheet = stock_info.get_balance_sheet('AMZN')df_balance_sheet
AMZN Balance Sheet | Data from Yahoo Finance

1.3 Intraday

Yahoo Finance also offers 1min historical intraday data for up to 10 days.

#!pip install yfinance
#!pip install mplfinance
import yfinance as yf
import mplfinance as mpf
from datetime import datetime
sd = datetime(2020, 8, 24)
ed = datetime(2020, 8, 25)
df = yf.download(tickers='AMZN', start=sd, end=ed, interval="1m")
mpf.plot(df,type='candle',mav=(3,6,9),volume=True)
One-minute bar | Data From Yahoo Finance

Of course, you need to persist the data in a flat-file or database before the 10-day window expires as the example here. Note that you only need to save the most granular one, as code similar to below will help aggregate to any other desired frequency.

df2 = df.resample('5T').agg({'Open': 'first', 'High': 'max', 'Low': 'min', 'Close': 'last', 'Volume': 'sum'})  # to five-minute barmpf.plot(df2,type='candle',mav=(3,6,9),volume=True)
Aggregated five-minute from one-minute bar | Data from Yahoo Finance

1.4. Live Quotes

It is also possible to scrape Yahoo Finance Live stock quotes using web scraping tools. The package yahoo_fin has done exactly that so you can just call its functions if you don’t want to write one yourself. The following code gets the real-time stock price every second and then save it for later use. It is suggested to run the code during market hours. Usually, people start listening to the real-time stock price at market open and then save the data at market close.

# !pip install requests_html
# !pip install yahoo_fin
import numpy as np
import pandas as pd
from yahoo_fin import stock_info
from datetime import datetime
import time
# realtime quotes
for i in range(10):
now = datetime.now()
price = stock_info.get_live_price("SPY")
print(now, 'SPY:', price)
real_time_quotes.loc[i] = [now, price]
time.sleep(1)
print(real_time_quotes)# save for later use
# real_time_quotes.to_csv('realtime_tick_data.csv', index=False)

2. Pandas DataReader

This is not precisely a data source but rather an API in PyData stack that supports a collection of data sources. As the name suggests, data will be downloaded as pandas Dataframe. The full document can be found here. Below is the list of sources that it currently supports. I’ll just go through the first two in the list.

Supported Data Sources | Pandas-DataReader

2.1 Alphavantage

Alphavatange has its own API here. You can refer to their documentation. To use pandas data_reader,

import pandas as pd
import pandas_datareader as pdr
import mplfinance as mpf
ts = pdr.av.time_series.AVTimeSeriesReader('AMZN', api_key='YOUR_FREE_API_KEY')df = ts.read()
df.index = pd.to_datetime(df.index, format='%Y-%m-%d')
mpf.plot(df,type='candle',mav=(3,6,9),volume=True,show_nontrading=True)
CandleStick Daily | Data from Alphavantage

2.2 FRED

FRED has plenty of macro-economics data for example GDP, unemployment, inflation. Here if you are interested in the interest rates market,

import pandas_datareader as pdr
start = datetime(2019, 1, 1)
end = datetime(2019, 12, 31)
syms = ['DGS1MO', 'DGS3MO', 'DGS1', 'DGS3', 'DGS10']
df = pd.DataFrame()
for sym in syms:
ts = pdr.fred.FredReader(sym, start=start, end=end)
df1 = ts.read()
df = pd.concat([df, df1], axis=1)
df

gives the constant maturity yield curves. The full example is can be found here on GitHub.

UST CMT Curve | Data from FRED

3. Quandl

Quandl has hundreds of free and paid data sources, across equities, fixed incomes, commodities, exchange rates, etc. Here I’ll just cover commodities, one asset class that hasn’t been covered in the previous two sections.

There are monthly raw contracts and continuous contracts, on Quandl you are able to download both.

import quandlstart = datetime(2019, 1, 1)
end = datetime(2019, 12, 31)
df = quandl.get('CHRIS/CME_CL1', start_date=start, end_date=end, qopts={'columns': ['Settle']}, authtoken='your_free_api_key')plt.figure(figsize=(20,10))
plt.plot(df.Settle)
CL1 Curve | Data from Quandl

4. Interactive Brokers

This is a supplemental section, feel free to skip if you don’t have Interactive Brokers account.

This method is only conditionally free, conditional on that you have a funded Interactive Brokers account. I use it mainly because, for example, Yahoo Finance doesn’t have futures data. The full script is located here on GitHub.

IB offers as short as one-second bar up to 180 days. To download the one-second bar, log on to IB, execute this script, and then run below.

# !python download_historical_data_from_ib.pyimport pickle
import mplfinance as mpf
with open(r'.\data\tick\20200810.pkl', 'rb') as f:
futures_hist_prices_dict = pickle.load(f)
df = futures_hist_prices_dict['ESU0 FUT GLOBEX']mpf.plot(df, type='candle',mav=(3,6,9), volume=True)
One-Second Bar | Data from IB Historical Data API

Conclusion

In this post, we’ve seen three free historical financial data sources, namely Yahoo Finance, Pandas DataReader, and Quandl, across equities, rates, fx, cryptocurrency, and commodities. as well as one conditional free intraday data source, Interactive Brokers.

For general purpose, we can always resort to more fundamental libraries such as requests, BeautifulSoup, or Selenium to scrape data directly. More on that later.

--

--