Stock Market Data Analysis Part 1

These past few weeks I was surfing around net and was trying to learn so called the next big thing of IT industry- ‘Data Analysis’ . After some Zillion searches ,I found this wonderful tutorial that explains how to use Python for Data Analysis(Jose Marcial Portilla udemy course.)
After learning how to use cool libraries of Python(Numpy and Pandas), I thought to apply it on some real life data.So a few light years later , here we are doing Analysis on Stock from NSE(just another name for Indian Stock Market).

Before we begin first lets see what Numpy and Pandas are.

Python has two libraries namely Numpy and Pandas which we use for analysis.Numpy supports large arrays and provides high-level mathematical functions .While Pandas provides Data Structures required for data manipulation and analysis .We use Seaborn and Matplotlib for plotting Graphs.

Phew!Enough Theory…Let’s get to the interesting stuff.

For those who don’t know, Stock Market is the dark gloomy place where one goes to buy Scary Stuff. I am kiddin! Stock Market is so called Market where one can buy and sell stock of publicly listed companies and earn money(remember Harshad Mehta fiasco?!) along with various other stuff(securities,commodities,derivatives,et al).It’s an interesting field for Data Analysis, as it has numerous data points which one can study and infer results from.

Following post previously written by Eli Kastelein, lets choose some company to work on.

For my case study , I chose 4 big tech companies listed on NSE: Tech Mahindra, Wipro, TCS and Persistent.Let’s look at their closing price from past year.

Some initial observations are:

• TCS seems to have largest price range and is significantly higher than its competitors(Didn’t know that!)
• Tech Mahindra and Persistent had a sudden drop in price around March’15.(This happened because they split shares 2:1)
• It looks like TCS was the one with most price change,while Wipro was the one with the least.(Go Tata Go!)

It’s hard to judge correlation from this screenshot but we can look at correlation between each pair of Companies using PairGrid from Seaborn. The data being we use for comparison is the change in percentage of volume traded in one day:

Scary!,it is.It isn’t easy to see patterns here. Much of the data(on scatter plot) is right around zero point. This means that except for occasional outliers, there is not a large difference in the amount of stock traded.

It’s difficult to even make a guess as to which pair has highest correlation without a basic knowledge of Kernel density estimation plot , but it looks like TCS and Wipro are somehow correlated .This can be confirmed by a correlation plot:

All of the correlations are positive, but the highest was only .34, which is fairly weak (1 stands for perfect correlation).As this data is the change in percentage of volume traded in a day , it does not necessarily reflect closing prices.Lets corrplot the closing price used in 1-year line chart:

Now things are way clearer. We can see that Persistent and Tech Mahindra’s closing prices seem to move together as well as Wipro and Tech Mahindra. These correlations are much stronger than those of change in percentage of volume.

Lastly , lets apply 7 days, 28 days and 50 days moving average(MA) to Wipro closing price.

We can see that 7 days MA follow more closely to line graph ,while 50 days MA shows us the overall trend.

If I had money,I would use this type of statistical analysis for playing Stock market myself, but investment firms use models which are 1000 time more complex than this.(Fun Fact- 84% of All Stock Trades Are By High-Frequency Computers … Only 16% Are Done By Human Traders)

I plan to write next article that covers Risk Analysis and methods to predict the future price of stock.

If you enjoyed my post please hit recommend. Your encouragement would mean the world to me.

May the Force be with you!