The Data Science Road Map

TechwithMe
5 min readOct 12, 2021

--

In order to become an exceptional data scientist, one must follow the road map I have laid down which may not be as linear as it looks but more of a circular process. I’ll discuss the many stages of data science work, as well as frequent addled areas, important things to remember, and where data science differs from other disciplines. In here I have highlighted the different stages in the data science road map.

  • Frame the problem
  • Understand the data.
  • Extract features.
  • Model and analyse the data.
  • Deployment of code and present results.

1. Frame the data science problem.

This is the first and important step when undertaking any data science task. Many Great data scientist such as Dean Abbott, founder of Abbott Analytics, have mastered this art of asking the right questions before solving the any problem. There is no amount of technical expertise or statistical rigor that can compensate for the fact that you addressed a pointless problem. We need to identify the type of client we are dealing with, whether human or machine. Most initiatives begin with a pretty open-ended query if your clients are humans. There may be a known problem, but it’s unclear what a solution would entail. If your clients are machines, the business challenge is usually obvious, but there can be some doubt about the software limitations (languages to employ, runtime, how accurate predictions must be, and so on). It’s crucial to define exactly what constitutes a solution to this problem before getting down to business. It’s crucial to define exactly what constitutes a solution to this problem before getting down to business.

2. Understand the data. (data science road map)

Understanding the data can be categorized into 3 sub-groups. Once undergone this steps you will have full understanding of your data and it’s useful characteristics and features.

These sub-groups include;

  • Basic Questions
  • Data wrangling
  • Exploratory analysis

Basic Questions.

I give you an overview of the what questions to ask when working with ypur data. Before taking your data under any statistical analysis, you will need to have a good bundle of questions to ask about your data. This helps by making sure that you are working on the right problem and with the right dataset. Here is a list of some of the generic questions you can ask.

  • Is this data representative enough? For example, maybe data was only collected for a subset of users.
  • Are there likely to be gross outliers or extraordinary sources of noise?
  • Are there any fields that are unique identifiers?
  • Is this the entire dataset?

There are lots of many generic questions you could ask. Please feel free to check out my article on generic questions to ask about your data.

Data Wrangling.

Data wrangling is the process of transforming raw data into something that can be used for more conventional analytics. Most times as a data scientist you will not alwalys be tasked with data that is in the best way to work on your analysis. This whole process means that you will be constructing a software pipeline that extracts data from wherever it is stored, does any necessary cleaning or filtering, and converts it to a standard format.

Exploratory analysis.

After transforming your data into a more suitable format, the next step is exploratory data analysis. This will be achieved by using a little bit of your statistics knowledge. Typically exploratory analysis is using visual methods to present some of the basic or complex characteristics of your data. Some of the statistics concepts that you can employ for your exploratory analysis are correlation which you can visualize in form of heatmaps, mean, mode, quantile, skewness and kurtosis.

3. Extract features from the data.

A feature is essentially a number or a category extracted from your data that describes an entity. You could, for example, extract the average word length or the number of characters in a text document. The most critical aspect of getting your analysis to function is extracting good features. A particularly good feature will usually connect to a real-world phenomena. The majority of the information we extract will be used to make a prediction. You may, however, need to extract the object you’re predicting, also known as the target variable.

4. Creating a model.

When deciding on the type of model to implement. We first off understand the type of machine learning algorithm that will blend well with our problem. Examples of these algorithms, a regression model that predicts a stock’s price on the next day, or a clustering algorithm that breaks customers into different segments. When working on your model just make sure it is tuned to solve the particular problem you are trying to solve.

5. Present your results/findings.

If your client is human, you will need to present your results in form of reports describing what your findings were. Most times, you will need to present your reports in form of graphs or any other visual method. This is because most of the people you will be working with will not be as technical as you are.

6. Deployment of your code.

Deployment of your code stands to be the last step in the data science road map. This stage, you will need to present your model to a form that the user can be able to understand. How do you achieve this? This is very simple since most of the hard-work you have already taken care of. All you need to do is design an interface that will make your code understandable to your user. You can make your code understandle to the user through web app or mobile app. Check out this simple deploymeny system.

--

--