Prerequisites before starting Data Science journey

6 min readAug 27, 2018

This is part of The ULTIMATE Curriculum in Data Science which you can refer for more topics related to Data Science.

The big technology trend is to make systems intelligent and data is the raw material. -Amod Malviya

All right mate, now you have started your journey to the “Adventures of Data Science” and there is no coming back from now. But you can’t fight in this field with using old school technologies, so i present you all that you need as Prerequisites before starting Data Science journey.

Software

Right now there is only three programming languages that is preferred in Data Science community.

python
R
Javascript

Technically you can code in any language as you like but to make this tutorial easier, i will use python as my go to language and all of my tutorials will be in python.

Right now just follow these steps and after that i will give detailed information of all of them.

Install Python as been instructed. (preferred to install python 3.x )

How to Install Python on Windows

Python doesn't come prepackaged with Windows, but that doesn't mean Windows users won't find the flexible programming…

www.howtogeek.com

Install PIP.

How to Install PIP for Python on Windows, Mac, and Linux

Many Python developers rely on a tool called PIP for Python to make everything much easier and faster. Installing and…

www.makeuseof.com

Install Jupyter notebook.

A tutorial

The Jupyter Notebook is an incredibly powerful tool for interactively developing and presenting data science projects…

www.dataquest.io

write the following line in the command as

pip install numpy pandas scipy matplotlib scikit-learn seaborn

Python

Python is a popular and powerful interpreted language. Unlike R, Python is a complete language and platform that you can use for both research and development and developing production systems. You will use this to write your code.

Jupyter

Jupyter Notebook is an open-source web application that allows us to create and share codes and documents.

It provides an environment, where you can document your code, run it, look at the outcome, visualize data and see the results without leaving the environment. This makes it a handy tool for performing end to end data science workflows — data cleaning, statistical modeling, building and training machine learning models, visualizing data, and many, many other uses. You will mostly write all the code in this IDE.

To learn how to write Jupyter notebook,

How To Use Jupyter Notebooks | Codecademy

INTRODUCTION Jupyter Notebooks are a powerful way to write and iterate on your Python code for data analysis. Rather…

www.codecademy.com

Getting Started With Jupyter Notebook for Python

This post has been published first on CodingTheSmartWay.com.

medium.com

Tip: if you want to run python code online in your browser, try out Google Collab. It provides same jupyter notebook styled experience in your browser.

Google Colaboratory

Edit description

colab.research.google.com

Numpy

NumPy is the foundational library for scientific computing in Python, and many of other libraries use NumPy arrays as their basic inputs and outputs. In short, NumPy introduces objects for multidimensional arrays and matrices, as well as routines that allow developers to perform advanced mathematical and statistical functions on those arrays with as little code as possible.

Pandas

Pandas adds data structures and tools that are designed for practical data analysis in finance, statistics, social sciences, and engineering. Pandas works well with incomplete, messy, and unlabeled data (i.e., the kind of data you’re likely to encounter in the real world), and provides tools for shaping, merging, reshaping, and slicing data sets. (and if you are thinking that there are python and pandas in Data Science, wait there is also anaconda which will require a separate tutorial).

There are two main data structures in this library:

“Series” — one-dimensional

“Data Frames”, two-dimensional

When you want to receive a new Dataframe from these two types of structures, as a result you will receive such DF by appending a single row to a DataFrame by passing a Series:

Scikit Learn

scikit-learn builds on NumPy and SciPy by adding a set of algorithms for common machine learning and data mining tasks, including clustering, regression, and classification. As a library, scikit-learn has a lot going for it. Its tools are well-documented and its contributors include many machine learning experts. What’s more, it’s a very curated library, meaning developers won’t have to choose between different versions of the same algorithm.

If you want to see all the algorithms that scikit learn provides, go to their official website.

scikit-learn: machine learning in Python - scikit-learn 0.19.2 documentation

Edit description

scikit-learn.org

journey to select preferred algorithm for our problem

Scipy

SciPy builds on NumPy by adding a collection of algorithms and high-level commands for manipulating and visualizing data. This package includes functions for computing integrals numerically, solving differential equations, optimization, and more.

Matplotlib

matplotlib is the standard Python library for creating 2D plots and graphs. It’s pretty low-level, meaning it requires more commands to generate nice-looking graphs and figures than with some more advanced libraries. However, the flip side of that is flexibility.

With a bit of effort you can make just about any visualizations:

Line plots
Scatter plots
Bar charts and Histograms
Pie charts
Stem plots
Contour plots
Quiver plots
Spectrograms

There are also facilities for creating labels, grids, legends, and many other formatting entities with Matplotlib. Basically, everything is customizable.

Seaborn

Seaborn is a popular visualization library that builds on matplotlib’s foundation. The first thing you’ll notice about Seaborn is that its default styles are much more sophisticated than matplotlib’s. Beyond that, Seaborn is a higher-level library, meaning it’s easier to generate certain kinds of plots, including heat maps, time series, and violin plots.

As of now only these modules can be used to start your Data Science journey. In the upcoming courses i will mention many other modules that will be required as you progress to becoming the ULTIMATE Data Scientist.

To get the latest updates, tips and anything you want or have issue just post in the comments.

Till then….

Happy coding :)

And Don’t forget to clap clap clap…

References

Top 15 Python Libraries for Data Science in 2017

As Python has gained a lot of traction in the recent years in Data Science industry, we wanted to outline some of its…

activewizards.com

Prerequisites before starting Data Science journey

Software

How to Install Python on Windows

Python doesn't come prepackaged with Windows, but that doesn't mean Windows users won't find the flexible programming…

How to Install PIP for Python on Windows, Mac, and Linux

Many Python developers rely on a tool called PIP for Python to make everything much easier and faster. Installing and…

A tutorial

The Jupyter Notebook is an incredibly powerful tool for interactively developing and presenting data science projects…

Python

Jupyter

How To Use Jupyter Notebooks | Codecademy

INTRODUCTION Jupyter Notebooks are a powerful way to write and iterate on your Python code for data analysis. Rather…

Getting Started With Jupyter Notebook for Python

This post has been published first on CodingTheSmartWay.com.

Google Colaboratory

Edit description

Numpy

Pandas

Scikit Learn

scikit-learn: machine learning in Python - scikit-learn 0.19.2 documentation

Edit description

Scipy

Matplotlib

Seaborn

Till then….

Happy coding :)

References

Top 15 Python Libraries for Data Science in 2017

As Python has gained a lot of traction in the recent years in Data Science industry, we wanted to outline some of its…

Written by Gaurav Chauhan