Self Learning Data Science In 31 Days: The Article Version

Your Hitchhikers’ Guide To Mastering Data Science

XQ

Follow

Published in

The Research Nest

6 min readNov 21, 2019

--

Preface

With vast amounts of data being generated in modern times, there is an ever-increasing need for professionals who can make any valuable sense out of it: the data scientists. Today, with humongous volumes of resources available online, self-learning is not beyond scope anymore.

This unique article is intended to empower individuals for the same by providing the best-curated resources to learn and implement practical projects. The article is split into 4 major parts with each one laying emphasis on certain fundamental aspects of data science. Each part is expected to take about a week’s time to explore and learn. It further focuses on practicing data science using Python.

I would recommend you quickly skim through the article to get an overview of various topics, resources, and the learning approach and then bookmark it to refer back to, in your month-long journey of mastering data science!

Note From The Editor

As technology frameworks change and get updated with time, some of the code samples may not work as expected, but however, the core concepts in the resources shared should stay accurate. Focus on these concepts more than the code in itself.
Please do mention the credits when you share/use this article elsewhere.
For any feedback/errata you can mail us at the.research.nest@gmail.com

What To Expect

Explanatory Articles
Hands-on-tutorials
Practical Insights

Finding Your Data

The first step is all about identifying what domain you want to work in and finding the relevant dataset. Data science starts with data collection after all. Choose a dataset in your domain of interest, download the same and get ready for some action!

Below are some links, where you can find public datasets in different sectors:

What Can You Do With Your Dataset?

Once you have your dataset ready, there are broadly (but not limited to) three kinds of applications you can build using the same. These include-

Prediction
Classification
Recommendation

Apart from that, you can try to find hidden patterns in the data. Have a good look at your dataset and the variables in it. Identify what kind of analysis it can be used for and finalize the problem to tackle.

Is it classification, regression, or clustering based? If your dataset appears inconclusive to any of the above-mentioned categories, as a beginner we would recommend you change your dataset and find a more relevant one.

Subjects & Pre-requisites

Here is a comprehensive compilation of learning resources you may need on your journey en-route to becoming a data scientist.

While you may not need to know all of them in detail to get started. Having a general idea of these topics can prove to be extremely useful.

Links to quickly learn some key concepts

Data Pre-Processing

Before you can start analyzing the dataset, you need to make some modifications to make it a bit more programming friendly. Here are some standard approaches used for the same. Try implementing these techniques as per relevance for your chosen dataset.

Tutorials Of Various Pre-Processing Approaches

Performing EDA

Once we have a detailed and clean dataset in hand, we can do various statistical analyses and visualizations to better understand our data.

Wikipedia has an entire page dedicated to EDA. You can refer the same to get an overview of what it is all about.

Links To Some Useful Resources

There are several libraries available in Python for performing EDA. You can easily find one based on your requirements and proceed further.

Once the data is thoroughly analyzed, we can proceed to the next step of building some predictive models using different techniques and ultimately formulate a tangible application with practical significance. Document your insights from the dataset. These can be used for data-driven decision making and consulting.

To learn more about the statistics behind hypothesis testing, visit these links:

You have a clean dataset ready and doing an exploratory data analysis should give a very clear picture of what we can do with the dataset. Prepare a problem statement to tackle. It could be something like predicting or grouping a parameter, and so on. Spend considerable time in defining the right problem, because no matter how efficient your solution is, if you are solving a wrong problem, there is no use to it.

Choosing the right model for the situation can be challenging for a beginner.

Based on your understanding, you can finalize to use 2–3 methods and get ready to build your model.

Here are two useful articles exploring basic machine learning algorithms for data science and the scenarios in which they are preferred.

Top 10 Machine Learning Algorithms For Data Science
Choosing The Right Algorithm For Your Dataset (This is a MUST read)

Essential Machine Learning

With the dataset prepared and problem statements formulated, the stage is all set to build and train your models using various ML methods.

Here are some must-read resources for any aspiring data scientist summarizing almost everything you need to know.

Additional Hands-On Tutorials

The following tutorials are for those interested in further exploring the practical applications of machine learning. These are highly recommended reads, especially if you are a beginner.

Applied Machine Learning: Part 1

Prediction Using Linear Regression, LassoCV, ElasticNet, RidgeCV, and xgboost

medium.com

Applied Machine Learning: Part 2

Convolutional Neural Networks for Image Recognition

medium.com

Applied Machine Learning: Part 3

Classification Using Naive Bayes, Linear SVM, Logistic Regression, and Random Forest

medium.com

Endnotes

This compilation is an effort associated with the e-learning social media campaign, The December Data Festival, 2018. The campaign ended with a small project tutorial competition. You can find the winning and runner up entries of the same here-

Data Science Tutorial: Analysis Of The Google Play Store Dataset

Winning submission- December Data Festival 2018

medium.com

Data Science Tutorial: An Analysis of the ICO’s Crop Data

Runner Up Submission: December Data Festival 2018

medium.com

I would love to hear your feedback and suggestions for improvement. Do drop us an email at the.research.nest@gmail.com.

Hope you found this useful. To support and stay updated with more such initiatives, do follow Research Nest’s publication on Medium.

Thanks to Nivetha Balu for helping with the editing of the original booklet.

If you want to download a PDF version of this compilation, visit- http://bit.ly/self-learning-datascience

Self Learning Data Science In 31 Days: The Article Version

Your Hitchhikers’ Guide To Mastering Data Science

Preface

Note From The Editor

What To Expect

Finding Your Data

What Can You Do With Your Dataset?

Subjects & Pre-requisites

Data Pre-Processing

Performing EDA

Essential Machine Learning

Additional Hands-On Tutorials

Applied Machine Learning: Part 1

Prediction Using Linear Regression, LassoCV, ElasticNet, RidgeCV, and xgboost

Applied Machine Learning: Part 2

Convolutional Neural Networks for Image Recognition

Applied Machine Learning: Part 3

Classification Using Naive Bayes, Linear SVM, Logistic Regression, and Random Forest

Data Science Tutorial: Analysis Of The Google Play Store Dataset

Winning submission- December Data Festival 2018

Data Science Tutorial: An Analysis of the ICO’s Crop Data

Runner Up Submission: December Data Festival 2018

Written by XQ