Benchmarking the NIFTY 50

Surjya Banerjee
Analytics Vidhya
Published in
6 min readMay 25, 2020

The Nifty 50, owned and managed by the NSE Indices Limited, is a diversified 50 stock index, the components of which account for nearly 13 sectors of the Indian Economy.
Computed using the free-float market capitalization methodology, the Index is well-diversified and can be used for a variety of purposes such as benchmarking fund portfolios or launching of index funds, Exchange Traded Funds, and structured products.

In this article, we use Python to analyze the various stock components of the Nifty 50 in terms of their volatility and their performance over the last 5 years by using the index as a benchmark.

The Capital Asset Pricing Model (CAPM) is a versatile tool used by analysts over the industry to calculate expected returns from an investment, given the risk-free rate, Beta, and equity premium of the investment. The variable Beta used in the model, is a measure of the systematic risk of the investment. The volatility of a stock in comparison to the market as a whole can be inferred from its Beta.

Beta is determined by finding the slope of the best fit line, or the regression line between the investment and the benchmark.

With the regression performed, the proportion of the movement attributed to the market as a whole, or the market-specific risk can be determined by finding the r_squared value of the regression.

Further, Jensen’s Alpha, which theoretically is the difference between the actual and expected return of the investment, can be calculated by finding the difference between the intercept of the best-fit line and the term

(Risk-Free Rate)*(1-Beta)

This metric essentially is an indicator of the performance of the investment in contrast to the expected returns from it.

Fetching the Data

We begin by importing the relevant libraries and then using Pandas to form a data frame comprising information which we will use to pull the historical data of individual stock performances (from Yahoo Finance).

import numpy as np
import pandas as pd
from scipy import stats
import matplotlib.pyplot as plt
pull=pd.read_csv('https://raw.githubusercontent.com/SurjyaB/vigilant-robot/master/DB.csv')
pull.head()
First 5 rows of the created Dataframe.

Now, we create two lists - one with the names of all the companies and the other with the links to fetch the data and pair them up together to form a nested list. Once done, we create an empty dictionary called company_data and then run a FOR loop to fetch the data using the links in the nested list and then save the fetched data as values in the dictionary with the respective company names as their keys.

companies=list(pull.name)
links=list(pull.link)
pair=list(zip(companies, links))company_data={}
for x,y in pair:
company_data[x]=pd.read_csv(y, parse_dates=True, index_col=0)

Pulling the data for Nifty, we get-

company_data['nifty']

Now, we plot the fetched data for Nifty in order to get a visual representation of its movement over the last 5 years.


plt.plot(company_data['nifty'].Close)
plt.xlabel('Year')
plt.ylabel('Nifty 50')
plt.title('Index Movement 2015-2020')

The Analysis Bit

With the data retrieved, the next task is to determine the (weekly) relative movement of the individual stocks.

At this point, it is vital that we take some time to understand the reason behind choice of the data time-interval. With so much data available for use, why did we just stick to last 5 years of weekly data?
It might seem that the further we go behind in history and collect more data, the patterns become more visible in the grand picture.
However, in corporate world, with so much change in every moment in terms of corporate governance and policies, it doesn’t make sense to compare its performance over a grand scale of time. Also, stepping up the frequency from weekly to daily will just introduce more noise in the data. Hence, 5 years of weekly data seems to be a reasonable choice.

We define a function to determine the change in the weekly Closing Price of the stocks in terms of percentage.

def price_change(close):
return (close-close.shift(1))/close.shift(1)
price_change(company_data['nifty'].Close)
Output- Percentage Change.

Similarly using the previous technique of defining an empty dictionary and using a FOR loop, we get the percent change for all the stocks.

for y in list(company_data.keys()):
pc[y] = price_change(company_data[y].Close)
movement_data=pd.DataFrame(pc)
movement_data.head()

The data frame’s top row returns null values as there were no previous records (in the data frame) to determine the percentage changes from. We proceed by dropping the entire row.

movement_data.dropna(axis='rows',inplace=True)movement_data.head()
Percentage of all stocks in the Nifty 50 (Top 5 rows)

Now with the data almost tuned to our requirement, we can begin to analyze the stocks by fitting a regression model between the benchmark index returns and the individual stock returns. To begin with, let us start with a single company, say BHARTI AIRTEL.

import statsmodels.api as sm 

X=movement_data.nifty
y=movement_data.bharti_airtel

X1=sm.add_constant(X)
model=sm.OLS(y,X1)
reg=model.fit()
print(reg.summary())
Regression Summary.

We can see that the slope of the best-fit line between Bharti Airtel and Nifty 50 index is 0.7632 and the intercept 0.002. The R_squared value for the regression is 0.188.
Plotting the regression line-

import seaborn as sns
sns.regplot(x=movement_data.nifty,y=movement_data.bharti_airtel)
plt.xlabel('NIFTY')
plt.ylabel('BHARTI AIRTEL')
Bharti Airtel vs Nifty 50 Regression Line.

Alternatively, we could have also used Scipy to get the regression results.

stats.linregress(movement_data.nifty,movement_data.bharti_airtel)
LinregressResult(slope=0.7631723142521402, intercept=0.0020140086653414034, rvalue=0.43383082438038933, pvalue=2.106984979373908e-13, stderr=0.09848597430566054)

Now instead of finding the results between individual companies and the benchmark, we run a FOR loop to get the Slopes, Intercepts, R_Squared Values, and the Standard Errors for all the company-benchmark pairs.

slope=[]
intercept=[]
r_value=[]
r_squared=[]
std_error=[]


for x in list(movement_data.columns):
s,i,rv,pv,se = stats.linregress(movement_data.nifty,movement_data[x])

slope.append(s)
intercept.append(i)
r_value.append(rv)
std_error.append(se)
for x in r_value:
rsq= x ** 2
r_squared.append(rsq)
metrics=list(zip(list(pull.name_yf),slope,intercept,r_squared,std_error))df=pd.DataFrame(metrics)df.head()
Output Data Frame.

Next, we rename the columns and drop the first row as we do not want the regression results between the benchmark and itself.

df.drop(index=0,axis='rows',inplace=True)df.columns=['Name','Slope(Beta)','Intercept','r_squared','Std. Error']df.head()

If we recall, the r_squared value of our regression is effectively the proportion of risk which is explained by the movement of the benchmark. More specifically, it is the proportion of risks that is market-specific to a company. Conversely, the (1-r_squared) value is the proportion of risk which is firm-specific to a company.

Conclusively, we calculate the Jensen’s alpha (using 6% as the risk-free rate) of the stocks-

df.columns=['Name', 'Slope(Beta)', 'Intercept', 'Market Specific Risk', 'Std. Error'])
df['Firm Specific Risk']=[1-x for x in df['Market Specific Risk']]r_f=0.06
u=r_f*(1-df['Slope(Beta)'])
v=df.Intercept
JA=v-u
df["Jensen\'s Alpha"] = JA
df=df[['Name','Slope(Beta)','Intercept','Market Specific Risk','Firm Specific Risk','Jensen\'s Alpha','Std. Error']]
df

To finish off, with all the data, we pass on a final layman’s verdict on the performance of the stocks over the regression period using Jensen’s Alpha.

df['Performance']=['Over Performed' if x >= 0 else 'Under Performed' for x in JA]
Top 5 Performers in terms of Actual Performance vs Expectation
Bottom 5 Performers in terms of Actual Performance vs Expectation

Footnote

With more advances in the field of Finance, multiple pricing models, such as the Arbitrage Pricing Model, Multi-Factor Model, etc have been developed of late. However, the CAPM still holds a place of its own due to its relative simplicity.
One thing to note is that the risk-free rate used in the analysis might not be ‘Risk-Free’ in its truest sense as the rate is unadjusted for its sovereign default spread in order to keep the analysis simple.

Links-
https://www1.nseindia.com/content/indices/ind_nifty50.pdf
https://raw.githubusercontent.com/SurjyaB/vigilant-robot/master/DB.csv
https://in.finance.yahoo.com/

--

--

Surjya Banerjee
Analytics Vidhya

Mechanical Engineer | Data Science Enthusiast/Beginner