Kaggle Datasets — A Great Place to Start Exploring Data Science

Krishna Kanth
Beginner @ Data Science
2 min readJul 21, 2016

Exploring Data Science is all about getting your hands dirty by picking up interesting data and diving into it, probably armed with your own ideas and languages like R, Python and etc. In this regard, it would really help if you know where to actually start.

Kaggle is a great place for this purpose.

Image Source

What is Kaggle?

Kaggle is a global community for people involved or interested in transforming the way data is seen in this world. It’s a competitive platform for data scientists where they can take up challenges and solve real-world problems in some of the most creative and efficient ways.

Kaggle Datasets

Kaggle provides numerous public-datasets for anyone interested in performing their own analysis on the real world data by applying models and deducing insights. It’s offering some really interesteing and unique datasets:

2016 US Elections
ISIS Twitter Usage
Climate Change
Game of Thrones
US Baby Names
Airplane Crashes

…are some datasets, just to name a few!

Alongside the renowned Data Science competitions that Kaggle conducts, exploring these datasets is also a great way for a beginner to get habituated with data analysis.

And then there are Kernels!

Kaggle Kernels

Kernels on Kaggle (previously called Scripts) are reports in which a user can present his findings, the type of models he used in the process, the script/code he developed for the analysis, and also show the insights of his analysis in the form of visualizations. I found Kernels to be of great help to those who wants to study and understand various analysis models. You can also discuss a Kernel with its author and provide him your comments and feedback about what you think of the analysis.

By looking at these Kernels you will get an idea of where to start with your data analysis upon a given dataset. This puts you in the right path to explore and learn things from data. I particularly suggest beginners to start with data preparation activities using R or Python. It’s a very important part of projects, most of the time is spent in data preprocessing activities that are necessary for making data to be analysis-ready. I’m no expert at this but I did start with it myself and found it comfortable.

So, try out different things, tweak data, visualize it and see what it says.

Good luck!

--

--