Getting Started with Data Science
Data Science has become a very lucrative career (aka Sexiest Job of 21st century) and many of the professionals either from tech backgrounds or non-tech backgrounds are transitioning into Data Science roles. Data science and Machine learning have been there for ages, but with the drastic growth in data abundance, the demand for such roles spiked in the last few years.
For any beginner, who is trying to transition into Data Science roles, the major hurdle which I feel is finding the correct path and guidance on how to get started, what to learn first, the correct order of lessons, which of the resources that are available online are useful, etc.
As there is abundance of material available online on Data Science, it overwhelms someone who is trying to find the point from where he/she can start.
To solve this, I am going to share a series of blogs which will give an overview of journey from beginning your DS/ML journey to deploying your first models in a production equivalent environment. This is the first part of series which will give a brief idea on the very first stepping point towards DS.
I believe to become a successful Data Scientist, it takes a combination of things that a person should go through to understand the overall process, the science and the mathematics behind DS projects.
Six steps to become a successful Data Scientist.
#1. Mathematics for Machine Learning :
The first thing that anyone who is planning to start with Data Science should focus on is learning the basic mathematics behind the algorithms generally used in DS/ML projects. The following sequence can be followed to get a smooth beginning :
- Linear Algebra, Vector Algebra, Matrices, Multivariable Calculus(like differentiation, maxima-minim, jacobian, etc)
- Some statistics concepts like Inferential Statistics and Hypothesis Testing
I have shared few resources at the end of this blog that can be referred to to learn these topics.
#2. Coding Language and IDE:
Also along with mathematical understanding, you need one language for coding purposes. You can take up any one language out of Python, R, Scala, etc. But most of the DS beginners prefer to go with Python, the reason being the vast support of libraries like NumPy, pandas, sklearn, statmodels, etc which are frequently used while creating ML models.
For Python, you can refer to the following resources to begin with:
- HackerEarth
- Exercism Python Track (Mentor mode is quite cool. Do check out.)
- Pandas at Kaggle
- Numpy
There is a number of IDE that is available, but I personally prefer the following IDEs :
- Pycharm: This is helpful for python coding practice as it provides lots of extensions that come in handy at times. Also, you should set up Pycharm with Anaconda as it is helpful in managing your python libraries in one place.
- Jupyter: I like this IDE especially for exploratory data analysis and ML model building as you can view the output of each line of code separately and manage effectively.
#3. Machine Learning Algorithms :
After having good foundational knowledge, you can start learning few algorithms in following order :
- Linear Regression
- Logistic Regression
- Naïve Bayes
- Time Series Forecasting
- Tree models
- Unsupervised Clustering
- There are many more algorithms and specialization fields like NLP and Deep Learning which we will discuss in the next post in detail.
#4. Data Visualization :
Along with learning your way to code and create models, it is equally important to be able to visualize data and draw meaningful insights. Visualization skills become very important as these tools will help you convey the data story to business stakeholders at the end. Whatever model you have built becomes ineffective if you are not able to present actionable insights to the stakeholders.
To learn visualization, you can start your learning with :
- Matplotlib
- Seaborn
- Tableau
- Power BI
#5. Hands-On Practice :
Most of us believe that actual learning happens when we put our knowledge to the test and solve real-world problems.
The best go-to site for any DS enthusiast in the world is Kaggle. You can find real world projects, hackathon, tutorials and a chance to connect with fellow experienced Data Scientists.
You can start with Kaggle in the following manner :
- Firstly, you can gain some knowledge with the free courses. The benefit which I see doing these courses is it will give a precise knowledge to start your DS journey and at the same time make you comfortable with the environment of Kaggle.
- Then, you can start your first competition with the Titanic ML competition. In this challenge, you will be using machine learning to create a model that predicts which passengers survived the Titanic shipwreck.
- The best way to learn is to share what you have via Notebooks on Kaggle and engage yourself in discussions where you will find many people with various experience levels helping you out and giving valuable industry insight.
#6. Model Deployment :
If you planning to move to Data Science roles in any organization, candidates who have end to end knowledge of the full development cycle are preferred. If you look into any DS/ML project pipeline, deployment is a crucial step. There are many free deployment environments available on Google Cloud Platform and Azure. Amazon also provides a dedicated machine learning platform known as Amazon Sagemaker.
I think the above information is good for beginners to start their journey. I will share detailed blog posts on each topic and further steps too in future, so please follow me on Linkedin and medium and if you like my post or have some suggestions feel free to comment, I would love to interact with you.
Also, If you would like to opt for a mentored track to speed up your learning curve, you can opt for courses like Upgrad which offers training from India’s most reputed colleges and teachers. Follow the link here.
Upcoming posts :
- Statistics of ML
- Things to keep in mind for Exploratory Data Analysis.
- Deep dive on the most widely used ML algorithms.
- Getting started with NLP and Deep Learning
- and many more …
Some useful resources for Maths:-
- KhanAcademy : https://www.khanacademy.org/math/statistics-probability/probability-library
- https://www.khanacademy.org/math/multivariable-calculus
- 3Blue1Brown Youtube Channel : Essence of Linear Algebra