Human Component in Machine Learning
With automation in machine learning, humans are still indispensable to make the connection between data, algorithm, and the real world
With automation becoming increasingly popular in the field of machine learning, one may wonder if the role of humans in machine learning will become non-essential at some point.
When building a machine learning model, it’s important to remember that the model must produce meaningful and interpretable results in real-life situations. This is where the human experience comes in. A human (qualified data science professional) has to examine the results produced by algorithms and computers to ensure that the results are consistent with real-world situations before recommending a model for deployment. With automation in machine learning, humans are still indispensable to make the connection between data, algorithms, and the real world.
In this article, we will discuss the 3 essential components of a machine learning model, namely: 1) Data Component, 2) Algorithm Component, and 3) Real World Component. Finally, we will examine the various roles played by a human (qualified data science professional) to ensure the 3 components of a machine learning model are interacting with each other in a meaningful and beneficial manner.
II. Review of essential components of a machine learning model
There are 3 essential components of a machine learning model, as shown in the figure below.
1. Data Component
This component consists of everything about data and includes the following:
i) Sources of Data
This section deals with all sources of data such as
a) design of experiments or surveys for collecting data
b) purchasing data from organizations that mine and store large datasets
c) use of an open dataset
d) simulating raw data to combine it with actual sampled data
ii) Data Preparation and Transformation
This deals with preprocessing raw data to convert it into a form that is ready for analysis or model building and includes topics such as
a) handling missing data
b) data imputation
c) encoding categorical data
d) identification of predictor features and target features
e) data scaling, for example, feature standardization or normalization
f) feature selection and dimensionality reduction
g) advanced methods for data transformation such as PCA and LDA
Software that can be used for data preparation and transformation include
- Pandas package
2. Algorithm Component
These are algorithms that are applied to data in order to derive useful and insightful information from the data. The algorithms could be categorized as descriptive, predictive, or prescriptive.
i) Algorithms for Descriptive Analytics
These include packages that can be applied to data for visualization purposes, for example, algorithms to produce barplots, line graphs, histograms, scatter plots, pairplots, density plots, qqplots, etc. Some of the most common packages for descriptive analytics include
ii) Algorithms for Predictive Analytics
These are algorithms that are used for building predictive models. Some of the most common packages for predictive analytics include
- Sci-kit learn package
- Caret package
Predictive analytics algorithms can be further classified into the following groups:
a) Supervised Learning (Continuous Variable Prediction)
- Basic regression
- Multiregression analysis
- Regularized regression
b) Supervised Learning (Discrete Variable Prediction)
- Logistic Regression Classifier
- Support Vector Machine Classifier
- K-nearest neighbor (KNN) Classifier
- Naive Bayes
- Decision Tree Classifier
- Random Forest Classifier
c) Unsupervised Learning
- Kmeans clustering algorithm
iii) Algorithms for Prescriptive Analytics
These are algorithms that can be used for prescribing a course of active based on insights derived from data. Some prescriptive analytics algorithms include
a) Probabilistic modeling
c) Optimization methods and operations research
c) Monte-carlo simulations
3. Real World Component
Every machine learning model must produce meaningful and interpretable results in real-life situations. A predictive model must be validated against reality in order to be considered meaningful and useful. Human input and experience are therefore always necessary and beneficial for making sense out of results produced by algorithms.
III. The human component of analytics modeling
With automation in machine learning, humans are still indispensable to make the connection between data, algorithms, and the real world. In this section, we discuss the roles played by a human data scientist to connect the 3 essential components of a machine learning model already discussed above.
a) Check quality and reliability of data
Data is key to any data science and machine learning task. Data comes in different flavors such as numerical data, categorical data, text data, image data, voice data, and video data. The predictive power of a model depends on the quality of data used in building the model. It is therefore extremely important that before feeding data into a model, the quality and reliability of the data be checked by a qualified professional because even datasets that appear perfect may contain errors. There are several factors that could diminish the quality of your data:
- Wrong Data
- Missing Data
- Outliers in Data
- Redundancy in Data
- Unbalanced Data
- Lack of Variability in Data
- Dynamic Data
- Size of Data
For more information, please see the following article: Data is Always Imperfect.
b) Check type and quality of the algorithm to be used
Because there are several different types of machine learning algorithms, a qualified professional has to check to ensure the algorithm being selected for use is the appropriate one and that it is the most optimal one. Hence a qualified professional has to assess the output from the algorithm to determine the level of error and uncertainty in the outputs.
c) Ensure that ethical standards are implemented
Ethics and privacy considerations are a must in data science and machine learning. A qualified professional is needed to ensure the data and algorithm used in the machine learning model will not intentionally produce bias in results. Ethical standards must be held high in all phases, from data collection to analysis, to model building, testing, and application. Care must be taken to avoid fabricating results for the purpose of misleading or manipulating the customer or the general public.
d) Ensure output is beneficial to the general public
As a case study here, a machine learning model could be used to design the active chemical components to be used for manufacturing a vaccine to combat a certain disease. In this case, qualified personnel will be needed to assess the efficacy of the vaccine by performing clinical trials to ensure the vaccine is safe and accurate.
In summary, we’ve discussed several reasons why a human (qualified data science professional) is still indispensable in the age of automation in machine learning. With automation gaining more and more ground in machine learning, humans will still be indispensable to make the connection between data, algorithms, and the real world and to ensure ethical standards are held high in machine learning.