How Machine Learning Algorithms are Selected by Data Scientist?

Pravin
AlmaBetter
Published in
3 min readJun 15, 2021

Once in a while a witty philosopher told that “There is a reason behind every decision made by us”

Most of us may experience this scenario. During the learning process, we may come across different machine learning algorithms and eventually there will be an internal monologue that questions us. When to use those machine learning algorithms? Is that any reason behind them? Most of us might obviously be stuck to this question and I’m too the same. I’d spend some time to get an apt answer for those questions. After some time I'm managed to get an apt answer for those question and I’m sharing those answers.

1. Based on Business Perspective.

Business needs are one of the typical Decision-making factors for data scientists in selecting the best-fit machine learning model for the problem which needs to be solved by them. It also depends on the complexity of the problem. We’ll discuss the topic with the following business examples.

Example №1: XYZ is an online-based real estate firm. They are planning to launch a house price prediction service for their premium customer. So they approached ADC a Data Science Consulting company and asked them to build a solution for them. Here the complexity of the problem is very low. For this case, we can select any of the simpler algorithms such as Linear Regression, Logistic Regression, Decision Tree and soon for modelling. And it will be the best fit model for the problem. In case if we used complex algorithms for this problem and the consequence will be the model becomes overfit. It makes no sense in using them. From XYZ’s point of view, they need to know how the model was build and its working. So ADC needs to build a simpler model which is indeed interpretable by nature.

Example №2: PQRS is a news application that shows personalised curated news feed to their users. To do so they have an in-house Data Science team to solve the business problems. In this case, the complexity of the problem is a bit high. So the best option available from our side is some complex modelling such as Deep Learning, Neural Networks etc. These algorithms will work fine for such complex problems. In case if we opt to use simpler algorithms for this business problem and consequence of this will be the model becomes underfit. And it makes no sense in using them. Here PQRS’s perspective is mainly focused on solving the business problem. The Data Science team needs to build a complex model which is capable of solving such complex business problem. The Interpretable nature of the model, in this case, is not necessarily needed.

Epitome: “Complexity is inversely proportional to Interpertablility”

2. Based on Data

Data itself serves as a paramount part of data science. It also influences the decision making factor for the data scientist to make up their decision in selecting the best-fit machine learning model for the business problem that needs to solved by them. The Nature of the Input and output variable also determines the best-fit machine learning model.

Nature of Input Variable: Based on the input variable we can further classify the machine learning algorithms into three categories based on the following criteria.

  1. If the input variables are labelled — Supervised Learning Model Algorithms are used.
  2. If the input variables are unlabelled with an intention to find structure in the data — Unsupervised Learning Model Algorithms are used.
  3. If the model is intended to learn in an interactive environment by trial and error using feedback from its own deeds — Reinforcement Learning Algorithms are used.

Nature of Output Variable: Based on the output variable we can further classify machine learning algorithms into three categories based on the following criteria.

1.Numerical/Continous value as output — Regression-based Modelling.

2.Categorical/Discrete value as output — Classification-based Modelling.

3.If output values are set of input groups — Clustering-based Modelling.

Epitome: “Face is index of the mind, Similarly Data is the index of valuable insights”. So better try to dive into the data and find best suit machine learning model.

--

--

Pravin
AlmaBetter

I love solving problems of different magnitude and dimensions.