Data Science, Machine Learning

Human Component in Machine Learning

With automation in machine learning, humans are still indispensable to make the connection between data, algorithm, and the real world

Benjamin Obi Tayo Ph.D.
Nov 22 · 5 min read
Image for post
Image for post
Photo by Andy Kelly on Unsplash

I. Introduction

With automation becoming increasingly popular in the field of machine learning, one may wonder if the role of humans in machine learning will become non-essential at some point.

When building a machine learning model, it’s important to remember that the model must produce meaningful and interpretable results in real-life situations. This is where the human experience comes in. A human (qualified data science professional) has to examine the results produced by algorithms and computers to ensure that the results are consistent with real-world situations before recommending a model for deployment. With automation in machine learning, humans are still indispensable to make the connection between data, algorithms, and the real world.

In this article, we will discuss the 3 essential components of a machine learning model, namely: 1) Data Component, 2) Algorithm Component, and 3) Real World Component. Finally, we will examine the various roles played by a human (qualified data science professional) to ensure the 3 components of a machine learning model are interacting with each other in a meaningful and beneficial manner.

II. Review of essential components of a machine learning model

There are 3 essential components of a machine learning model, as shown in the figure below.

Image for post
Image for post
The 3 essential components of a machine learning model. Image by Benjamin O. Tayo

This component consists of everything about data and includes the following:

i) Sources of Data

This section deals with all sources of data such as

a) design of experiments or surveys for collecting data

b) purchasing data from organizations that mine and store large datasets

c) use of an open dataset

d) simulating raw data to combine it with actual sampled data

ii) Data Preparation and Transformation

This deals with preprocessing raw data to convert it into a form that is ready for analysis or model building and includes topics such as

a) handling missing data

b) data imputation

c) encoding categorical data

d) identification of predictor features and target features

e) data scaling, for example, feature standardization or normalization

f) feature selection and dimensionality reduction

g) advanced methods for data transformation such as PCA and LDA

Software that can be used for data preparation and transformation include

  • Pandas package
  • Excel
  • R
  • Python

These are algorithms that are applied to data in order to derive useful and insightful information from the data. The algorithms could be categorized as descriptive, predictive, or prescriptive.

i) Algorithms for Descriptive Analytics

These include packages that can be applied to data for visualization purposes, for example, algorithms to produce barplots, line graphs, histograms, scatter plots, pairplots, density plots, qqplots, etc. Some of the most common packages for descriptive analytics include

a) Matplotlib

b) Ggplot2

c) Seaborn

ii) Algorithms for Predictive Analytics

These are algorithms that are used for building predictive models. Some of the most common packages for predictive analytics include

  • Sci-kit learn package
  • Caret package
  • Tensorflow

Predictive analytics algorithms can be further classified into the following groups:

a) Supervised Learning (Continuous Variable Prediction)

  • Basic regression
  • Multiregression analysis
  • Regularized regression

b) Supervised Learning (Discrete Variable Prediction)

  • Logistic Regression Classifier
  • Support Vector Machine Classifier
  • K-nearest neighbor (KNN) Classifier
  • Naive Bayes
  • Decision Tree Classifier
  • Random Forest Classifier

c) Unsupervised Learning

  • Kmeans clustering algorithm

iii) Algorithms for Prescriptive Analytics

These are algorithms that can be used for prescribing a course of active based on insights derived from data. Some prescriptive analytics algorithms include

a) Probabilistic modeling

c) Optimization methods and operations research

c) Monte-carlo simulations

Every machine learning model must produce meaningful and interpretable results in real-life situations. A predictive model must be validated against reality in order to be considered meaningful and useful. Human input and experience are therefore always necessary and beneficial for making sense out of results produced by algorithms.

III. The human component of analytics modeling

With automation in machine learning, humans are still indispensable to make the connection between data, algorithms, and the real world. In this section, we discuss the roles played by a human data scientist to connect the 3 essential components of a machine learning model already discussed above.

a) Check quality and reliability of data

Data is key to any data science and machine learning task. Data comes in different flavors such as numerical data, categorical data, text data, image data, voice data, and video data. The predictive power of a model depends on the quality of data used in building the model. It is therefore extremely important that before feeding data into a model, the quality and reliability of the data be checked by a qualified professional because even datasets that appear perfect may contain errors. There are several factors that could diminish the quality of your data:

  • Wrong Data
  • Missing Data
  • Outliers in Data
  • Redundancy in Data
  • Unbalanced Data
  • Lack of Variability in Data
  • Dynamic Data
  • Size of Data

For more information, please see the following article: Data is Always Imperfect.

b) Check type and quality of the algorithm to be used

Because there are several different types of machine learning algorithms, a qualified professional has to check to ensure the algorithm being selected for use is the appropriate one and that it is the most optimal one. Hence a qualified professional has to assess the output from the algorithm to determine the level of error and uncertainty in the outputs.

c) Ensure that ethical standards are implemented

Ethics and privacy considerations are a must in data science and machine learning. A qualified professional is needed to ensure the data and algorithm used in the machine learning model will not intentionally produce bias in results. Ethical standards must be held high in all phases, from data collection to analysis, to model building, testing, and application. Care must be taken to avoid fabricating results for the purpose of misleading or manipulating the customer or the general public.

d) Ensure output is beneficial to the general public

As a case study here, a machine learning model could be used to design the active chemical components to be used for manufacturing a vaccine to combat a certain disease. In this case, qualified personnel will be needed to assess the efficacy of the vaccine by performing clinical trials to ensure the vaccine is safe and accurate.

IV. Summary

In summary, we’ve discussed several reasons why a human (qualified data science professional) is still indispensable in the age of automation in machine learning. With automation gaining more and more ground in machine learning, humans will still be indispensable to make the connection between data, algorithms, and the real world and to ensure ethical standards are held high in machine learning.

Towards AI

The Best of Tech, Science, and Engineering.

By Towards AI

Towards AI publishes the best of tech, science, and engineering. Subscribe to receive our updates right in your inbox. Interested in working with us? Please contact us → https://towardsai.net/contact Take a look

By signing up, you will create a Medium account if you don’t already have one. Review our Privacy Policy for more information about our privacy practices.

Check your inbox
Medium sent you an email at to complete your subscription.

Benjamin Obi Tayo Ph.D.

Written by

Physicist, Data Science Educator, Writer. Interests: Data Science, Machine Learning, AI, Python & R, Predictive Analytics, Materials Sciences, Biophysics

Towards AI

Towards AI is a world’s leading multidisciplinary science publication. Towards AI publishes the best of tech, science, and engineering. Read by thought-leaders and decision-makers around the world.

Benjamin Obi Tayo Ph.D.

Written by

Physicist, Data Science Educator, Writer. Interests: Data Science, Machine Learning, AI, Python & R, Predictive Analytics, Materials Sciences, Biophysics

Towards AI

Towards AI is a world’s leading multidisciplinary science publication. Towards AI publishes the best of tech, science, and engineering. Read by thought-leaders and decision-makers around the world.

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app