How to create a stock correlation matrix in python

Rohan Kumar
Analytics Vidhya
Published in
2 min readJul 15, 2020

In this tutorial I’ll walk you through a simple methodology to correlate various stocks against each other. We’ll grab the prices of the selected stocks using python, drop them into a clean dataframe, run a correlation, and visualize our results.

  1. Import the libraries
import numpy as np 
import pandas as pd
# Used to grab the stock prices, with yahoo
import pandas_datareader as web
from datetime import datetime
# To visualize the results
import matplotlib.pyplot as plt import seaborn

2. Select the list of tickers and select the daterange

start = datetime(2017, 1, 1)
symbols_list = ['AAPL', 'F', 'TWTR', 'FB', 'AAL', 'AMZN', 'GOOGL', 'GE', 'TSLA', 'IBM', 'PYPL']
#array to store prices
symbols=[]

3.Pull stock prices, push into clean dataframe

#array to store prices
symbols=[]
for ticker in symbols_list:
r = web.DataReader(ticker, 'yahoo', start)
# add a symbol column
r['Symbol'] = ticker
symbols.append(r)
# concatenate into df
df = pd.concat(symbols)
df = df.reset_index()
df = df[['Date', 'Close', 'Symbol']]
df.head()
df_pivot=df.pivot('Date','Symbol','Close').reset_index()
df_pivot.head()

4. Now, we can run the correlation. Using the Pandas ‘corr’ function to compute the Pearson correlation coeffecient between each pair of equities

corr_df = df_pivot.corr(method='pearson')
#reset symbol as index (rather than 0-X)
corr_df.head().reset_index()
#del corr_df.index.name
corr_df.head(10)

5. Finally, we can plot a heatmap of the correlations (with Seaborn and Matplotlib) to better visualize the results:

plt.figure(figsize=(13, 8))
seaborn.heatmap(corr_df, annot=True, cmap=’RdYlGn’)
plt.figure()

--

--

Rohan Kumar
Analytics Vidhya

Poet | Story writer | Blogger "I took a walk in the woods and came out taller than the trees."~ Henry David Thorea