Data Science : Getting Started
Data Science is one of the hot topics in these days. If you are a tech enthusiast like me, chances are, you get to hear terms like Data science, big data, Machine learning, Artificial intelligence constantly in your daily life. You may be wondering how to get started with data science. Here is a comprehensive guide written to guide you towards getting started with this amazing field.
Steps you could follow
Step 1: Know What is Data Science
To get started in Data science, the first step is to get to know, what is “Data science”, why is it so important to learn, Why are people talking about this. Do I really love doing this job. Most of the people doesn't go through it, they just jump right away learning programming languages.
Data science is not all about programming, its about love towards the subject ,integration of domain knowledge, Math & Statistics, Creativity, Thinking and at last programming. I suggest people who are getting started in data science to get started with this course called “Data science 101” by cognitiveclass.ai. Here you will get to know the answers for most of the questions raised above. Once you have completed the course, take your time figuring out, if you are genuinely interested in the subject.
Step 2: Learning Spreadsheets
Yeah!, I am serious, learn spreadsheets. As a data scientist, you may rarely use this tool, but for a beginner, its important to learn. Believe me, that’s how i started leaning Data science. Spreadsheet is a great tool to experience data. Its a “Non- programmer data science tool”. Here is a great course by Microsoft. Its called Introduction to Data Analysis using Excel. If you are interested in learning advance topics in excel, go ahead with the other two courses in the list.
Step 3: Learn Python
Python is a great programming language for a beginner to get started with programming. Its easy to learn, simple, free (open source), and most important is that, its extensively used in the field of Data science and Analytics. There are lot of MOOC’s that teach python, but the one that impressed me a lot as a beginner was Data Camp’s “Intro to Python for Data Science”course. But wait, this isn’t all you need to succeed in python. Actually you just got started with python and you have a lot to learn. Python is a very vast subject and i believe, you should take this course’s on Udacity to learn more on Python. The first one is “Intro to Computer Science” and the other is “Introduction to Python”. Once completed, you should stop learning programming!!, Noooo!!!!!, just kidding. A person can be good in programming only if he/she practice a lot. I suggest to start with Hackerrank’s Python tutorial. Here you get to practice a lot of python programs, its amazing for a beginner. There are also websites like Hackerearth, Codecademy, Geeksforgeeks and others which might come handy while learning Python.
Some of the other useful resources to learn Python are listed below
- Python 3.4 Programming Tutorials by thenewboston (youtube)
- Python 3 tutorial by Sololearn
- Learn Python on learnpython.org
- Learn Python by Codecademy
- EDX’s Introduction to Computer science
- Coursera’s Python for Everybody
- Python Tutorial on Mode Analytics
- Siraj Raval’s Learn Python for Data Science
Books that helps learning Python includes,
- Python 3 tutorial by Tutorialspoint
- Learn Python the hard way
- Automate the Boring Stuff with Python by Al Sweigart
Step 4: Learn SQL
SQL (se-qu-el) or Structured Query Language is a programming language designed to help users to read, manage, manipulate and change data in a relational database. Its very much important to learn SQL. Khan Academy has a great course on SQL called “Intro to SQL: Querying and managing data” which i suggest you to get started with. There are a lot of resourses available on web related to SQL like this one by W3 Schools which can be useful while you study SQL. Its also suggested to learn NoSQL which stands for Not only SQL. NoSQL is used in dealing with Non- relational database’s. Here is an awesome course on Udacity called “Data Wrangling with MongoDB” which teaches NoSQL in an awesome way.
Some of the other useful resources to learn sql are listed below
- Intro to SQL for Data Science by Datacamp (Beginner)
- Try SQLBolt’s interactive tutorial.
- Kaggle SQL scavenger hunt
- Sololearn’s SQL Tutorial
- Mode Analytics SQL training
Books that helps learning SQL includes,
- SQL Tutorials by Tutorialspoint
- Mongodb tutorials by Tutorialspoint
- SQL The Complete Reference, 3rd Edition, by James R Groff , Paul N Weinberg, AJ Oppel
- Fundamentals of Database Systems by Ramez Elmasri and Shamkant B Navathe
Step 5: Learn Math & Statistics
Statistics is the heart of Data science. If you are good in all the subjects but very bad in statistics, chances are you may not succeed in your professional journey with data science. Statistics is important and also serves as a basement to data science and Machine learning. This course named “Probability and Statistics” on Khan academy is great way to get started in Statistics. You can also learn statistics on Udacity which i felt helpful during my studies. The first one is called “Intro to Statistics” thought by Sebastian Thrun, and other is called “Statistics”, which helped me to understand the concepts.
Math is also important in Data science. Math concepts like Linear Algebra, Calculus and Differential equations are expected to be known during your professional journey. Some useful resources are listed below
- Linear Algebra refresher course on Udacity
- Calculus One on Coursera
- Calculus by Khan Academy
- Multivariable calculus by Khan Academy
- Differential equations by Khan Academy
- Matrices by Khan Academy
- Linear Algebra by Khan Academy
- MIT’s Linear Algebra OCW
- MIT’s Introduction to Probability and Statistics OCW
Step 6: Learn R programming
R programming is an awesome tool used by Statisticians and Data scientist’s to deal with Statistics and data analysis. The best way to get started with R is using R studio.
R programming language has an awesome 3rd party library called “SWIRL” which stands for “Statistics with interactive R learning”. To learn R with swirl, follow the steps as described here.
Some of the other resource to learn R are listed below:
- Learn R on Kaggle
- R101 By cognitiveclass.ai
- Data Analysis with R by Udacity
- Data Science specialisation with R on Edx
- Data Science specialisation on Coursera
Step 7: Learn Data Analysis
Woh!! we just learnt the basics, now its time to officially start learning Data analysis using the above learnt tools. “Intro to Data Analysis” by Udacity is an awesome course that can help you teach data analysis. Here in this course you will learn how to pose a question, wrangle the data, explore data, find patterns, build intuition, draw conclusion and communicate the findings. Data analysis is incomplete without visualisations. Communicating your results is important and visualisations play an important role in it. Learning visualisations is great, here are the two courses from Coursera that helps you learn visualisation. First one is called “Python Data Visualization” thought by Rice University and the other is called “Building Data Visualization Tools” thought by Johns Hopkins University. Some of the other useful resources include
- Data Analysis with Python and Pandas
- Data Visualisation course on Kaggle
- Siraj Raval’s Intro to Data Analysis
Step 8: Learn Machine learning
Machine learning is one of the hot topics that is very important in the field of data science. We can simply put the definition of Machine learning to be, making machine’s learn stuffs. How cool is that, there is no bothering to write pages of code to accomplish this task and more over machine get better over time. So to get started with machine learning, we need to know statistics and python for which the resources are listed above. Some tutorials to learn machine learning include
- Machine learning with python by sentdex (YouTube)
- Andrew NG’s Machine learning tutorials on Coursera
- Udacity’s Intro to Machine Learning
- Udacity’s Machine learning course by Georgia Tech
- Machine Learning Crash course by Google
- Learn ML on Kaggle
- Siraj Raval’s Intro to Tensorflow
- Siraj Raval’s Machine Learning for Hackers
Now that you know basics of Data Analytics, its time to constantly upgrade your knowledge and follow the industry trends. There are many blogs, YouTube channels and websites dedicated to Data Analytics which includes Data Science Central, KDnuggets, Towards Data Science, Free code camp, Siraj Raval’s YouTube channel, Chris Albon, DataCamp Community and more. Do you have some more in your mind ?, comment them below
Udacity Nanodegree programs are a great way to upskill in the technology space. Co-created by Silicon Valley giants, they help you in your career progression. You can save Rs. 1,000 on your 1st enrollment if you enroll through my referral link: