Trend Prediction of NIFTY-50

akhlesh rai
4 min readJun 24, 2019

--

Purpose- the Intention of creating the Machine Learning model of NIFTY-50 is to predict the next day trend of NIFTY-50 by training/testing the model with 15 years past historical records of NIFTY-50 along with other key dependent variables (Brent crude price, Dollar rate, Hang Seng Index, Dow Index, FTSE index and FII daily activity)

https://github.com/raiak82/LongShortMemorymodelNifty/tree/master

The model creation and testing is 2 stage process-

Step-1: Nifty Data Mining and Pre-processing steps:

  • NSE (National stock exchange) provides open library NSE.py to pull historical and live data of Nifty values such as Nifty Open, High, Low, Close, Volume and Turnover data.

Refer to the NSE documentation om how to use the API

Upon fetching Nifty records, TALIB api is used to calculate technical indicator values such as Exponential Moving Average, MACD, RSI -14, Bollinger Bands values. This is again used by most of brokerage firm (you name it) to plot intuitive technical charts of Nifty index

Refer to the NSE documentation om how to use the API

  • Extract historical values of other important variables such as Brent Crude Price, USD-INR value, London Stock Exchange values (FTSE index), USA stock exchange price (Dow index), Hong Kong stock exchange (Hang Seng Index) and India Volatility Index (VIX) from Investing.com https://www.investing.com
  • Extract historical Foreign Institutional Investors (FII) data such as Foreign Institutional Investors (FII) future index values and Foreign Institutional Investors (FII) Option index values from

https://www.way2wealth.com/derivatives/fiiactivity/

Note- Historical records are extracted and saved using automated selenium test scripts, as data from the site (https://www.way2wealth.com/derivatives/fiiactivity/ ) fetched in tabular format with 15 records displayed each time which takes ~8–10 hours to fetch 15 years record of FII data. This is one time exercise to pull the historical records of FII data, hence, a selenium test automation script can do the job (test script available in my GitHub link)

5) Finally, all the data mined from different sources are merged together to create a single data frame.

Features

  • Nifty-50
  • Volume
  • Turnover
  • EMA-200
  • EMA-100
  • EMA-50
  • EMA-21
  • EMA-5
  • MACD
  • RSI-14
  • BollingerUpperBand
  • BollingerMiddleBand
  • BollingerLowerBand
  • Brent Crude Price
  • Dow Price
  • FTSE
  • Hang Seng Price
  • USD-INR price
  • Volatility Index VIX
  • Foreign Institutional Investors (FII) Index Future Net
  • Foreign Institutional Investors (FII) Index Options Net
Nifty    Volume      Turnover      EMA-200      EMA-100  \
Date
1/3/2005 2115.00 70506865 2.375100e+10 1807.768326 1863.544641
1/4/2005 2103.75 72718302 2.416130e+10 1810.713417 1868.301183
EMA-50 EMA-21 EMA-5 MACD RSI-14 \
Date
1/3/2005 1951.264565 2031.2347 2083.265125 43.080270 78.071485
1/4/2005 1957.244386 2037.8270 2090.093417 43.440559 73.511082
... BollingerMiddleBand BollingerLowerBand \
Date ...
1/3/2005 ... 2079.25 2041.161051
1/4/2005 ... 2085.73 2044.340649
Brent Crude Price Dow Price FTSE Hang Seng Price \
Date
1/3/2005 NaN 10729.42969 NaN 14237.42
1/4/2005 41.04 10630.78027 4847.0 14045.90
USD-INR price Volatility Index VIX FII Index Future Net \
Date
1/3/2005 43.400 14.08 -57.08
1/4/2005 43.495 13.98 -11.04
FII Index Options
Date
1/3/2005 -102.01
1/4/2005 -337.75
[2 rows x 21 columns]

Note: One of the important feature to predict the next day Nifty trend is Twitter Sentiment of hashtag #NIFTY

As there is a limitation to get historical tweets of #Nifty from last 15 years, it is a challenge to include the sentiment of hashtag #Nifty to the model

Step-2: Predictive ML Modelling- (LSTM- Multi-variate, multi-lag time-steps model)

1) Once the data mined from all the sources, next important step is Feature selection which is done by correlation of each independent variable with dependant variable of next day Nifty.

Using multiple feature selection techniques Heatmap, SelectKBest, Logistic Regression and ExtraTreeClassification top 15 features are selected which shows maximum variance with next day Nifty-50 value

Specs       Score
0 Nifty(t) 528.489901
1 Volume(t) 225.446605
2 Turnover(t) 219.674478
3 EMA-200(t) 550.082539
4 EMA-100(t) 550.181670
5 EMA-50(t) 556.558619
6 EMA-21(t) 552.139495
7 EMA-5(t) 537.801351
8 MACD(t) 99.041005
9 RSI-14(t) 217.358698
10 BollingerUpperBand(t) 523.966081
11 BollingerMiddleBand(t) 538.227012
12 BollingerLowerBand(t) 542.605087
13 Brent Crude Price(t) 376.710317
14 Dow Price(t) 496.904937
15 FTSE(t) 231.418879
16 Hang Seng Price(t) 274.480597
17 USD-INR price(t) 676.985024
18 Volatility Index VIX(t) 393.440538
19 FII Index Future Net(t) 21.178951
20 FII Index Options(t) 11.677231

2) After a feature reduction, time step delay of 5 time steps is applied to LSTM model with 500 nodes, epoch size 150 and batch size 10

Train on 3000 samples, validate on 569 samples

Epoch 1/150

- 27s — loss: 0.0013 — mean_squared_error: 0.0013 — val_loss: 0.0180 — val_mean_squared_error: 0.0180

Epoch 2/150

- 24s — loss: 0.0016 — mean_squared_error: 0.0016 — val_loss: 0.0043 — val_mean_squared_error: 0.0043

Epoch 3/150

4) Plot the Actual v/s Predicted Nifty Value

--

--