How to be a Data Scientist?

It just takes Commitment and Perseverance In becoming a Data-Scientist!

Manan B Shah
An Idea (by Ingenious Piece)
10 min readOct 11, 2020

--

Introduction

I myself have been confused for a long time and have made many mistakes while choosing the right direction in the field of Data-Science. I hope by the end of this blog I shall help many aspiring Data-Scientist to have a clear pathway to choose the best path in learning and developing in the field of data science.

In the 21st century, computer science advancement, development of intelligent machines, and the generation of immense amounts of data have led to the development of new fields of study, Data Science and Machine Learning. From simple tasks like sales prediction of the industry to ambitious projects like self-driven cars, everything is becoming possible by using algorithms and techniques of Data Science.

I shall even be listing the best courses rated and created by top Universities around the world(University of Toronto, MIT, etc.) for each section as a go along explaining in detail.

What exactly is the role of a data scientist?

In a brief overview, A data scientist is a professional who works with a large amount of data and extracts analytical insights and Information. They communicate their findings with senior leadership, management, and clients. Thus, companies can benefit from making the best-informed decisions to drive their business growth and profitability depends on the context of industries.

photo by me(MANAN B SHAH)

As a data scientist, you get thrown a lot of different types of problems. To be competent, you need to have a strong foundation in math, statistics, and programming. You need to know when to use certain techniques and algorithms depending on the problem and the data. In the end, you often need to present the results and techniques to the executives and less-technical audience( one of the most difficult paths)

What Steps does It take To Be a Data Scientist(For Free)?

I am sure that many of us come across the article from the Harvard Business Review back in 2012. A data scientist is a professional known as the sexiest job of the 21st century. The message here is that there will be a constant stream of analytic talent that will be required in all industries, where companies collect and use data for their competitive advantages and have one of the highest demanding jobs and trending jobs around the world.

Does the Data-science job sound fun and interesting to you? Then perfect, It is the right time to start learning and get your hands-on with the skills and phases required in the field of data science. I shall list out the pathway below which shall help you master data science and help you get a dream job.

I shall describe the pathway in 9 stages below which are as follows:-

  1. Fundamentals of Mathematics and Statistics
  2. Tools of Data-Science
  3. Data-science Fundamentals
  4. Machine Learning
  5. Model Deployment & software Engineering
  6. Big Data Analytics Tools
  7. Importance of Resume/Portfolio Building
  8. Essential Soft Skills
  9. Interview plan and Tips
  10. Conclusion

Let’s get started to understand in detail about each step with its best courses to follow:-

1.Fundamentals of Mathematics and Statistics

Mathematics Main concepts:

  • Linear Algebra
  • Systems of Linear Equations
  • Calculus
  • Big O
  • Probability

Statistics Main concepts:

  • Exploratory Graphics, Statistics
  • Descriptive Statistics
  • Inferential Statistics
  • Hypothesis Testing

Mathematics is the foundation of all the key data science processes. It includes Statistics, Linear Algebra, Differential calculus, Discrete math, etc. To get started we start with basic math related to statistics, calculus and linear algebra is a good start. This is important as a data scientist to understand the process behind how different algorithms work.

By the end of this stage, we shall have a strong hands-on with mathematics and statistics required for Data-science.

Get started with the following resources:-

  1. Mathematics Introduction(Khan Academy)
  2. Statistics(Coursera)

2. Tools of Data-Science

In this stage, we shall be talking about the required Tools needed before getting started with Data-sciences. There are three main things we need to understand before getting started with Data-science and machine learning models.

The three main tools required are as follows:-

  • Databases(SQL, PostgreSQL)
  • Coding in python / R language
  • Cloud computing tools in any one platform(AWS, GCP, Azure, Oracle)

Data scientists must be familiar with the various and different toolsets to work with data in various environments and platforms. A toolset should contain a combination of SQL, command line, coding, and cloud tools.

Here is a summary of how each tool is used:-

  • For data extraction and manipulation from the relational databases, SQL is the fundamental language used in almost anywhere
  • For general programming purposes (i.e., functions, for loops, iterations, etc.), Python is a good choice since it is already packaged with many libraries (i.e., visualization, machine learning, etc.).
  • For an additional boost, knowing command lines provide extra benefits especially for running jobs within cloud environments.
  • Cloud computing tools help you deploy various machine learning models, apps, and websites with the amount of storage and features we require

Get started with the following resources:-

  1. Learn to Program: The Fundamentals(Coursera)
  2. R Programming(Data Camp)
  3. Databases: Introduction to Relational Databases(edX)

3.Data-Science Fundamentals

In this stage, We shall get started with the fundamentals of Data-Science. firstly pick your desired Programming language. Python is a good choice since it is already packaged with many libraries as I would personally prefer. from here, you can grab concepts about data munging/wrangling (i.e., import data, aggregation, pivoting data, and missing value treatment).

After this, you have the most fun part of learning your data from data visualization (i.e., bar charts, histograms, pie charts, heat maps, and map visualizations).

Get started with the following resources:-

  1. Data Science A-Z™: Real-Life Data Science(Udemy)
  2. Intro to Data Analysis (Udacity)
  3. Data Visualization(Coursera)

4.Machine Learning

Learn the theory and application of machine learning algorithms. Then apply the concepts you learn to real-world data. You have a choice to pick between applied machine learning or big data ecosystem pathway. Note that you can always come back to master another path later. In my case, I choose to learn about applied machine learning first. It covers the aspect of building a machine learning model from an end to end. data exploration to model deployment. Decision Tree Algorithms are one of the main important and powerful tools in the machine learning field.

On the Other hand, Deep Learning is a subset of machine learning which one can learn once we have strong hands-on Machine learning, Statistics, And fundamentals of Data Science.

Get started with the following resources:-

  1. Machine Learning(Coursera)
  2. Intro to Machine Learning(edX)
  3. Creative Applications of Deep Learning(kadenze)
  4. Deep Learning A-Z™(Udemy)

5.Model Deployment & Software Engineering

While building a machine learning model shall be the fun part for all of us but it won’t help much for anyone else unless it can be deployed into a production environment. How to implement machine learning deployments is a difficult challenge. The process of taking a trained ML model and making its predictions available to users or other systems is known as deployment

Thinking about Model deployment as a software engineer rather than as a data scientist will simplify what it takes to deploy a model.

Deployment is entirely distinct from routine machine learning tasks like feature engineering, model selection, or model evaluation. As such, deployment is not very well understood amongst data scientists and ML engineers who lack backgrounds in software engineering or DevOps.

So we say Every Data Scientist should get well aware of a few Software Engineering skills like DevOps and luckily these skills are not very difficult and can be learned by data-Scientist through practice. ML practitioners must understand how to deploy their models as simply and efficiently as possible. The first step in determining how to deploy a model is understanding how end users should interact with that model’s predictions.

Get started with the following resources:-

  1. Software Testing(Udacity)
  2. Version Control with Git(Udacity)

6.Big Data Analytics Tools

The most important reason, studying Big Data is a rewarding and (at times) fun investment of your time. The domain of Big Data and data analysis in general is full of puzzles to solve, and will greatly enhance your analytical skills and reasoning. The major domains of Big Data involve statistics and problem-solving skills. Even if you don’t intend to make a career in Big Data, these skills are useful and highly practical on a day-to-day basis. I shall even enhance your career in the field of data science. The tools used in big data such as Hadoop, MapReduce, Apache Hive, Spark Streaming play a vital role in Data Science

Get started with the following resources:-

1.Taming Big Data with Apache Spark and Python

7.Resume & Portfolio Building

While a resume matters, having a portfolio of public evidence of your data science skills can do wonders for your job prospects. Even if you have a reference, the ability to show employers what you can do instead of just telling them you can do something is important. One of the greatest ways of building a portfolio and resume for a fresher is by adding Various Projects.

If you don’t have some data science related work experience, the best option here is to talk about a data science project that you have worked on

Types of Projects to Include in a Portfolio?

The best portfolio projects are less about doing fancy modeling and more about working with interesting data sets. A lot of people do things with financial information or Twitter data; those can work, but the data isn’t inherently that interesting.

The project isn’t done when you post it publicly. Don’t be afraid to keep adding on to or editing your projects after they’re published. Projects can be a never-ending and learning process.

A data science resume is a place to focus on your technical skills. Your resume is a chance to represent your qualifications and fit for that particular role. Recruiters and hiring managers skim resumes very quickly, and you only have a short and less time to make an impression. Improving your resume can increase your chance of getting an interview. You have to make sure every single line and every single section of your resume count.

A Good Data-Science resume should follow a few Important key points and Aspects:-

  • Length: Keep it simple and one-page max. This gives you the most impact for a quick glance.
  • Objective: Don’t include one. They don’t help you distinguish yourself from other people. Try to include At Least two.
  • Skills: Don’t give numerical ratings for your skills. If you want to rate yourself on your skills. Try using words like familiar or proficient. Do list technical skills that the job description mentions. The order you list your skills in can suggest what you are best at performing.
  • Projects: Don’t list common projects or homework. They aren’t that helpful in distinguishing you from other applicants. List projects that are Unique. don’t forget to mention the links.
  • Portfolio: Fill out your online presence. The most basic is a LinkedIn profile. It is like an extended resume. GitHub and Kaggle profiles can help show off your work. Fill out each profile and include links to other sites. Fill out descriptions for your GitHub repositories. Include links to your knowledge sharing blog (Medium, Quora).
  • Experience: if you don’t have work experience what do you do? Focus your resume on independent projects, like capstone projects, independent research, thesis work, or Kaggle competitions. These are substitutes for work experience if you don’t have work experience to put on your resume

8.Essential Soft Skills

Most of the market trends, insights from top business leaders, and industry data depict that as soft skills are of equal importance.

  • Critical Thinking

Data scientists can objectively analyze questions, hypotheses, and results and understand what resources are critical to solving a problem with the presence of critical thinking. They can also look at problems from differing views and perspectives.

  • Communication

Data scientists can explain what data-driven insights mean in business-relevant terms and communicate information in a way that highlights the value of the action. They can also show the research process and assumptions that led to a conclusion.

  • Problem Solving

One can identify opportunities and explain problems and solutions with Problem Solving skills. In the presence of problem solving skills, data scientists will know how to approach problems by identifying existing assumptions and resources and put on their detective hat and identify the most effective methods to use to get the right answers.

  • Business Knowledge

In today’s world of highly competitive age to have an edge against their competitors, companies should ensure their data scientists understand the business and its special needs and realize what organizational problems need to be solved. They can translate data into results that work for the organization.

9.Interview

The Interview process shall be in the form of the below three stages mostly:-

  • Technical questions:- The interviews involve a lot of technical questions. Most of the data science job role includes technical tasks by analyzing data and statistics. Recruiters ask technical questions to get a perspective of the hands-on technical capabilities of candidates. This involves questions on mathematics, statistics, coding, and machine learning.
  • Project-specific questions:- Projects are a crucial part of data scientist’s work. Highlighting the projects segment in an interview is key for a junior data scientist to land a job. It is recommended that data science aspirants must focus on their efforts on projects based on computer vision. Apart from working on these projects, it is also critical for candidates to gain knowledge of project details.
  • Soft skills Interviewers often try to gain deep insights into the candidate’s technical abilities and project knowledge. They try to analyze the soft skills of candidates although they are often not taken seriously by data scientists. Having soft skills are critical to communicating with other teammates as well as for sharing data insights to stakeholders.

10.Conclusion

In this data science roadmap, we have seen the key pillars of data science and related resources to get started with it. This is a growing list as new tools and technologies emerge every day to accommodate different use cases in data science. Please let us your thoughts in the comment section for adding more insights to the article.

I hope you found it useful. I plan on sharing more different projects I am working on or any other thoughts on medium! If you have any questions or want me to cover anything specific in the future, Let me know through my contact page below and you can even connect with me through my social media platform which also I shall be Mentioning below.

Thank you for reading!

Contact page

(LinkedIn, Twitter )

--

--