How to get started with Data Science

Anushka Bajpai
7 min readMar 25, 2022

--

“The way to get started is to quit talking and start DOING!” Walt Disney

Photo by Yousef Espanioly on Unsplash

The first step is always and will always be — to take an action. Usually it just getting stuck in a loop of thoughts that prevents us from getting started.

Couple of years back, stuck amidst lockdown, when I woke up on a Monday morning, I felt sort of empty. There was no hustle to rush to office, no traffic, nothing much to worry about. All I had to do was grab my laptop, log in and start working. Pretty sure many will relate to this as 80% of the world population was — Working from Home!

That very day I asked myself — How can I utilize this time?

What are the things I always wanted to do, but couldn’t?

I took a day off and spent my time pondering over and listing out everything under my to-do list. Let me share the top three with you all :

Masters in Data Science

Learn martial arts

Teach underprivileged kids

Fast forward 2 years — March 24th, 2022 I have got them all checked and nearing my final Data Science Masters thesis now. What a fabulous journey it has been. Each day was so much more productive and worthy as compared to what it could have been if I hadn’t taken consistent action on my “To-do” list.

The purpose of sharing this is just to remind you dear readers — It’s never too late. Get started from where you are today :)

Now let’s get to the point. In this blog we will cover the top three things to consider while starting with Data Science

PYTHON (or R)

Statistics

SQL

Trust me, once you get these clear , the rest will be a lot more interesting and easier to grab (remember you don’t need to master them all at once).

I will go ahead and share some of the best resources for all the three along with the list of things one needs to focus on for each of them.

Let’s get started . . .

1. PYTHON

Well, the first thing one needs to decide at the very beginning of this journey to pursue a career in Data Science is to choose a language. Two most popular languages among Machine Learning practitioners are Python and R, although there are cases where people have decided to use Java, C++, JavaScript, and others as well.

I chose Python because it was the second language (after Java), that I was comfortable at being an automation engineer. It ended up as the right choice for me and here are some of the reasons why it may even work out for you too irrespective of whether you have a coding background or not.

  • Community Support : Majority of the people use Python for Machine Learning so there’s lot of support and help available online
  • Powerful Packages : Python is a high-level programming language with a wide range of Machine Learning frameworks available making it easier for implementations.
  • Ease of Learning : There is a low barrier to entry since Python reads like English and has a friendly syntax to work with.
  • Faster Development and Processing : Python integrates well with other software components, making it a general purpose language that can be used to build a full end-to-end pipeline — starting with data, cleaning a model, and building that straight into production.
  • Compatible with Hadoop : One of the most popular open source platforms for big data, Hadoop is inherently compatible with Python. The Python package known as PyDoop lets us access the API for Hadoop.

To learn machine learning in Python you can start with basic of python. With not much textual content I’d just give you the milestones to learn:

  1. Python Fundamentals
  2. Literals
  3. Numbers
  4. Strings
  5. Python Data Structures
  6. Mutable
  7. Immutable
  8. Functions
  9. Python OOPs concept
  10. Standard Modules like os, sys, math etc.
  11. Data Manipulation: NumPy
  12. Data Manipulation: Pandas
  13. Data Visualization: Matplotlib
  14. Advance concepts

The three best and most important Python libraries for data science are NumPy, Pandas, and Matplotlib. Once the Python basics have been understood, it becomes easier to understand and implement the above 3 libraries as well.

Important Python libraries used in Data Science (Source : Learnbay)

Some of the best resources to start with

Udemy

https://www.udemy.com/course/100-days-of-code/

https://www.udemy.com/course/complete-python-developer-zero-to-mastery/

https://www.udemy.com/course/the-complete-python-course/

https://www.udemy.com/course/python-for-machine-learning-data-science-masterclass/

Other resources

Coursera

Codeacademy

DataCamp

Programiz (https://www.programiz.com/python-programming)

Python.org docs (https://docs.python.org/3/tutorial/index.html)

Must-read Books (for Python beginners)

1. Head First Python

Author : Paul Barry

Probably the best one to start with for beginners. It has a visually rich format to engage your mind, rather than a text-heavy approach that puts you to sleep.

2. Python Crash course

Author : Eric Matthes

I can truly vouch for this book after going through numerous other courses and books. This truly is the one stop book or solution you need to take your coding as well as python knowledge to a whole new level.

3. Learn PYTHON the Hard way

Author : Zed A. Shaw

An excellent book for beginners. It gives you an in depth overview of Python 3 and side-by-side encourages you to do hands-on experiments as you go.

Once you finish any one of the above three books, you will already know more than what you need for Data science. Just ensure sufficient practice along with reading.

Note :

Try to first pick one course/material/book and stick to it. Finish it first and then move to the next, else you may get trapped into multiple resources and the outcome wouldn’t be good enough.

There is plenty of help and forums available online. Ensure practical hands on and practice small codes on every concept you cover

Beginner Level Ebooks :

  1. Think Python — EBook https://greenteapress.com/wp/think-python-2e/
  2. The Hitchhiker’s Guide to Python https://docs.python-guide.org/intro/learning/
  3. A byte of Python — EBook https://python.swaroopch.com/
  4. Jupyter notebook mac shortcuts https://gist.github.com/kidpixo/f4318f8c8143adee5b40
  5. Problem-solving with algorithms — Interactive Ebook https://runestone.academy/runestone/books/published/pythonds/index.html

Control Structures and Functions in Python

  1. Decision Making — Supportive Interactive Content https://www.w3schools.in/python-tutorial/decision-making/

2. Loops and iterations — Supportive Interactive Content https://www.w3schools.com/python/python_for_loops.asp

3. Comprehensions — Explained visually https://treyhunner.com/2015/12/python-list-comprehensions-now-in-color/

4. Functions- A Byte of Python — EBook https://python.swaroopch.com/functions.html

5. Defining functions of your own — Supportive Content http://anh.cs.luc.edu/python/hands-on/3.1/handsonHtml/functions.html

6. Python 3 idioms test https://python-3-patterns-idioms-test.readthedocs.io/en/latest/Comprehensions.html

7. Map, Filter and Reduce Functions — Video Tutorial https://www.youtube.com/watch?v=hUes6y2b--0

OOP in Python

(Basic Level )

Class and Objects Video Tutorial

Methods Video Tutorial

Inheritance Supportive Content

Video Tutorial Playlist

2. SQL

Basics

  • SQL Statements (DDL/DML)
  • SQL Operators
  • Aggregate Functions
  • String and Sate-Time functions
  • Regular Expressions
  • SQL querying (nested)
  • Set theory
  • SQL Joins

Advanced

  • Functions
  • Stored Procedures
  • CTEs (Common Table expressions)
  • Packages
  • Decoding ERDs
  • Pivoting data: CASE & PIVOT syntax
  • Hierarchical Queries
  • Cursors: Implicit and Explicit
  • Triggers
  • Dynamic SQL
  • Transactions: COMMIT, ROLLBACK, Error Handling
  • Materialized Views
  • Query Optimization
  • XML Integration

Courses

Master SQL for Data Science

SQL for Data Science

SQL Fundamentals

SQLZOO

Select Star SQL

Books

Data Scientists: A Beginner’s Guide for Building Datasets for Analysis

SQL for Smarties: Advanced SQL Programming

SQL Cookbook: Query Solutions and Techniques For Database Developers

MySQL Administrator & Bible

Statistics

Source : AnalyticsVidhya

Tutorials

Statistics and Probability | Khan Academy

Statistics and Probability

Statistics Tutorial — Kaggle

Statistics Tutorial — TutorialsPoint

Online Courses

Learn Statistics with Python– Codecademy

Statistics Fundamentals with Python– Datacamp

Practical Statistics– Udacity

Video Tutorials

Introduction to Statistics — 365 Data science

Statistics and Probability Full Course

Statistics Course for Data Science

Statistics for Data Science — Great Learning

Statistics — Crash Course

Statistics And Probability

Books

Book on Practical Statistics

An Introduction to Statistical Learning

Think Stats

Naked Statistics

Final Thoughts

Today we all have quality content available at minimal cost, unlike before. If we plan our learning journey and get started with it TODAY, nothing can stop us dear readers.

Also, please remember, just knowing the theory isn’t enough.. it will never be!

PRACTISE PRACTISE PRACTISE

Prepare your own hand written notes for future reference.

Break things down to your comfort level. Share your learnings with others.

Photo by Dayne Topkin on Unsplash

--

--