Python for stock analysis

Rohan Kumar
Analytics Vidhya
Published in
9 min readFeb 25, 2021

In this project, we’ll analyse data from the stock market.

Again, we’ll use Pandas to extract and analyse the information, visualise it, and look at different ways to analyse the risk of a stock, based on its performance history.

Here are the questions I’ll try to answer:

  • What was the change in a stock’s price over time?
  • What was the daily return average of a stock?
  • What was the moving average of various stocks?
  • Why are the moving averages important?
  • What are techincal indicators and how to use them.
  • What was the correlation between daily returns of different stocks?
  • What is the return of various stocks?
  • How much value do we put at risk by investing in a particular stock?
  • How can we attempt to predict future stock behaviour?

We’re going to analyse some tech stocks, and it seems like a good idea to look at their performance over the last year.

We're going to analyse stock info for Apple, Google, Microsoft, and Amazon

Let’s read the data, I have the CSV files downloaded from Yahoo Finance. These CSV files have over 20 + years of data, that should be enough for this tutorial.

AAPL = pd.read_csv("AAPL.csv")MSFT = pd.read_csv("MSFT.csv")INTC = pd.read_csv("INTC.csv")GOOG = pd.read_csv("GOOG.csv")AMZN = pd.read_csv("AMZN.csv")

Now let’s see what we have here.

AAPL.head()
First few rows of Apple stock data

We can also check some stats of the data, i.e. how many rows are there, max value, mean value. By using a simple command, provided to us by Pandas.

#Basic stats for Apple’s StockAAPL.describe()
Basic stats of Apple stock data

We should now see what types of columns we have, again Pandas comes to rescue with another simple command.

# Some basic info about the dataframeAAPL.info()
Information about the Apple stock data

No missing info in the dataframe above, so we can go about our business.

What’s the change in stock’s price over time?

We’ll be analysing the last year i.e. 2020 data, so we have Pandas with another property called ‘truncate’ with which we can plot the data between specific dates.

# Plotting the stock's adjusted closing price using pandasAAPL.truncate(before='2020-01-01', after='2021-01-01')['Adj Close'].plot(legend=True,figsize=(12,5))

Just remember to do so, you need to put the “Date column” as the index of the dataframe which you can do with another simple command.

AAPL.set_index('Date',inplace=True)
AAPL Adj Close traded over time

Similarily, we can plot change in a stock’s volume being traded, over time. Again we’ll only be using last year i.e. 2020 data for this.

# Plotting the total volume being traded over timeAAPL.truncate(before='2020-01-01', after='2021-01-01')['Volume'].plot(legend=True,figsize=(12,5))
AAPL Volume traded over time

What are techincal indicators and how to use them

Technical indicators are exploratory variables usually derived from a stock’s price and volume. They are used to explain a stock’s price movements in hopes of predicting future swings. In other words, they are used to determine whether a stock is “overbought” or “oversold”. Though these indicators are widely exploited by both independent investors and hedge funds alike, many people do not have quick way of obtaining them. They have to resort to calculating each indicator one at a time. This process takes a great deal of time and computational power. Believe me. I’ve spent my fair share of time coding this process using python in the past.

Calculating technical indicators takes time away from the modeling process and can therefore be a deterrent to building more complex statistical models. With the TA (technical analysis) library though, we can substantiate any stock’s historical price data with more than 40 different technical indicators using just one line of code.

Install the TA (technical analysis) library

pip install --upgrade tafrom ta import add_all_ta_featuresfrom ta.utils import dropnamom_data = add_all_ta_features(AAPL, open=”Open”, high=”High”, low=”Low”, close=”Close”, volume=”Volume”)mom_data.columns

At this moment, the library has implemented 32 indicators:

Volume

  • Accumulation/Distribution Index (ADI)
  • On-Balance Volume (OBV)
  • On-Balance Volume mean (OBV mean)
  • Chaikin Money Flow (CMF)
  • Force Index (FI)
  • Ease of Movement (EoM, EMV)
  • Volume-price Trend (VPT)
  • Negative Volume Index (NVI)

Volatility

  • Average True Range (ATR)
  • Bollinger Bands (BB)
  • Keltner Channel (KC)
  • Donchian Channel (DC)

Trend

  • Moving Average Convergence Divergence (MACD)
  • Average Directional Movement Index (ADX)
  • Vortex Indicator (VI)
  • Trix (TRIX)
  • Mass Index (MI)
  • Commodity Channel Index (CCI)
  • Detrended Price Oscillator (DPO)
  • KST Oscillator (KST)
  • Ichimoku Kinkō Hyō (Ichimoku)

Momentum

  • Money Flow Index (MFI)
  • Relative Strength Index (RSI)
  • True strength index (TSI)
  • Ultimate Oscillator (UO)
  • Stochastic Oscillator (SR)
  • Williams %R (WR)
  • Awesome Oscillator (AO)

Others

  • Daily Return (DR)
  • Cumulative Return (CR)

These indicators result in 58 features. The developers can set a lot of input parameters such as the size of windows, different constants or smart automatic fill NaN values generated in the methods.

What was the moving average of various stocks?

Let’s check out the moving average for stocks over a 10, 20 and 50 day period of time. We’ll add that information to the stock’s dataframe.

Moving Average for 10, 20, 30 days

Let’s plot the same, again using only last year i.e. 2020 data.

AAPL.truncate(before='2020-01-01', after='2021-01-01')[['Adj Close','MA for 10 days','MA for 20 days','MA for 50 days']].plot(subplots=False,figsize=(12,5))
AAPL Moving Average Plot

Moving averages for more days have a smoother plot, as they’re less reliable on daily fluctuations. So even though, Apple’s stock has a slight dip near the start of year mostly due to the ramification of COVID-19, it’s generally been on an upward trend since early June.

Why are the moving averages important?

Moving averages are used to identify significant support and resistance levels.

Traders watch for crossovers of longer-term moving averages by shorter-term moving averages as possible indicators of trend changes to enter long and short positions.

According to Stan Weinstein: The price must be above the short term MA in order to buy a stock.

What was the daily return average of a stock?

# The daily return column can be created by using the percentage change over the adjusted closing priceAAPL['Daily Return'] = AAPL['Adj Close'].pct_change()
Daily Return column for AAPL

Now let’s plot daily return for the last year i.e. 2020

AAPL Daily Return

Now we will check how much positive or negative return we got, through a distplot of Seaborn library. A little about distplot before we do the plotting.

Seaborn distplot lets you show a histogram with a line on it. This can be shown in all kinds of variations. A distplot plots a univariate distribution of observations. The distplot() function combines the matplotlib hist function with the seaborn kdeplot() and rugplot() functions.

sns.distplot(AAPL[‘Daily Return’].dropna(),bins=50,color=’blue’)
AAPL Dist Plot

Positive daily returns seem to be slightly more frequent than negative returns for Apple.

What was the correlation between daily returns of different stocks?

I wrote an article on this before, you can check that out. How to create a stock correlation matrix in python

Now let’s combine all the stock data into one dataframe, so we can use that to calculate the correlation.

df = pd.concat(data)df = df.reset_index()df = df[[‘Date’, ‘Adj Close’, ‘Symbol’]]df.head()df_pivot=df.pivot(‘Date’,’Symbol’,’Adj Close’).reset_index()df_pivot.head()
Combined Dataframe of all stocks

Now, we can run the correlation. Using the Pandas ‘corr’ function to compute the Pearson correlation coeffecient between each pair of equities.

corr_df = df_pivot.corr(method='pearson')# reset symbol as index (rather than 0-X)corr_df.head().reset_index()# del corr_df.index.namecorr_df.head(10)
Correlation values between the stocks

Now let’s use Seaborn library to plot a heatmap and visualise the correlations in a better way

plt.figure(figsize=(13, 8))sns.heatmap(corr_df, annot=True, cmap="RdYlGn")plt.figure()
Correlation Matrix

As you can see, except Intel all the others are strongly correlated with each other.

What is the return of various stocks?

Let’s plot all the stock that we have.

df_pivot.plot(figsize=(10,4))plt.ylabel('Price')
Price plot for all the stock for the year 2020

Normalising Multiple Stocks

returnfstart = df_pivot.apply(lambda x: x / x[0])returnfstart.plot(figsize=(10,4)).axhline(1, lw=1, color=’black’)plt.ylabel(‘Return From Start Price’)

In this instance, I divided all the closing price to the first closing price in the period.

Return From Start Price

Daily return percentage Plot

df2=df_pivot.pct_change()df2.plot(figsize=(10,4))plt.axhline(0, color=’black’, lw=1)plt.ylabel(‘Daily Percentage Return’)
Daily Percentage Return

Because I have 5stocks overlapping each other, it is a little hard to make any comparisons here.

How much value do we put at risk by investing in a particular stock?

A basic way to quantify risk is to compare the expected return (which can be the mean of the stock’s daily returns) with the standard deviation of the daily returns.

risk = corr_df.dropna()
Risk plot

Intel(INTC) : — High Risk Low Return

Google(GOOG): — Low Risk Medium Return

Microsoft(MSFT): — Medium Risk Medium Return

Amazon(AMZN): — Low Risk High Return

Apple(AAPL): — High Risk High Return

Value at Risk

We can treat Value at risk as the amount of money we could expect to lose for a given confidence interval. We’ll use the ‘Bootstrap’ method and the ‘Monte Carlo Method’ to extract this value.

Bootstrap Method

Using this method, we calculate the empirical quantiles from a histogram of daily returns. The quantiles help us define our confidence interval.

sns.distplot(AAPL.truncate(before='2020-01-01', after='2021-01-01')['Daily Return'].dropna(),bins=100,color='purple')
Daily Return histogram for AAPL

To recap, our histogram for Apple’s stock looked like the above.And our dataframe looked like this.

# Using Pandas built in qualtile methodrisk['AAPL'].quantile(0.05)>> -0.04486874869710267

The 0.05 empirical quantile of daily returns is at -0.004703. This means that with 95% confidence, the worst daily loss will not exceed 4.48% (of the investment).

How can we attempt to predict future stock behaviour?

Here I am providing you with a list of different models that can be used to predict stock prices.

Predicting Stock Prices Using Facebook’s Prophet Model

Time-Series Forecasting: Predicting Microsoft (MSFT) Stock Prices Using ARIMA Model

Time-Series Forecasting: Predicting Apple Stock Price Using An LSTM Model

I recommend you to read through these articles, the models described are able to predict the prices with very good precision.

Disclaimer There have been attempts to predict stock prices using time series analysis algorithms, though they still cannot be used to place bets in the real market. This is just a tutorial article that does not intent in any way to “direct” people into buying stocks.

The marathon is over and, I must say: I accomplished what I wanted to….

Now its your turn to clab and follow me. Thank you for reading!

Give me a FOLLOW if you liked this, for more tech blogs!

“If at first you don’t succeed, then skydiving isn’t for you.” — Mel Helitzer

Till next time!

--

--

Rohan Kumar
Analytics Vidhya

Poet | Story writer | Blogger "I took a walk in the woods and came out taller than the trees."~ Henry David Thorea