Novel Corona Virus : Exploratory Data Analysis along with prediction by using Machine Learning Algorithms.

Saish Urumkar
Analytics Vidhya
Published in
3 min readFeb 1, 2020

A new infection based on SARS Corona virus is indeed deadly originated in Wuhan, China and spreading to other parts of world.

Background

2019-nCoV is a beta corona virus, like MERS and SARs, all of which have their origins in bats.

This dataset has daily level information on the number of affected cases, deaths and recovery from 2019 novel coronavirus.

The data is available from 22 Jan 2020.

Data

From WHO— On 31 December 2019, WHO was alerted to several cases of pneumonia in Wuhan City, Hubei Province of China. The virus did not match any other known virus. This raised concern because when a virus is new, we do not know how it affects people.

So daily level information on the affected people can give some interesting insights when it is made available to the broader data science community.

The data has daily level information on the number of affected cases, deaths and recovery from 2019 novel coronavirus.

The data is available from 22 Jan 2020.

John Hopkins Dashboard is the source of Data: https://docs.google.com/spreadsheets/d/1yZv9w9zRKwrGTaR-YzmAqMefw4wMlaXocejdxZaTs6w/htmlview?usp=sharing&sle=true#

Reading The data

Last Update, Confirmed, Deaths and Recovered are important columns

Visualizing

Deaths toll seems 100% in China
Countries having some cases of Virus →China with MAX cases
Cases Confirmed in China States
Case Fatality Rate In Chinese States
Calculating and plotting Local Outlier factor to check whether anomaly in data-set
LOF Score
K-means Clustering
Cluster formation

Applying Various Models to data-set

Linear Regression
Actual and predicted values
Simple Visualization
Applying Neural Network Model by using Stochastic gradient descent optimizer
Model Accuracy seems to be 80%
Trying one more model using “adam” optimizer and “sigmoid” activation function
Model accuracy is still 80% no change of optimizer.

Conclusion:- With more data-set i.e. real-time coronavirus can be predicted state wise in terms of spread before area being affected.

--

--

Saish Urumkar
Analytics Vidhya

Network Design, Network Security and Machine Learning Engineer passionate about learning new technologies and implementing them to real life scenarios.