Machine learning jargon, in plain English

Published in

The Hands-on Advisors

6 min readMar 5, 2018

Machine learning, just like any other field, has its fair share of jargon. But unlike many other fields, some of the terminology is downright misleading. Poor terminology choices make ML concepts harder to understand than they should be.

What follows is an incomplete, unordered, tongue-in-cheek list of common ML terms and what we actually mean when we use them. We’ll update the list from time to time.

This list was co-compiled by Fourkind’s Jarno Kartela.

What we say: model
What we actually mean: one or more functions trying to explain some system/environment works. Most models are terrible approximations of the real world, but some are less terrible than others. The whole idea of machine learning is to find the least terrible model for a particular problem. Yes, we’re pessimists.
What we say: regression
What we actually mean: predicting a number; e.g. predicting someone’s income based on their education, country and so on.
What we say: logistic regression
What we actually mean: classification; predicting if something belongs to a particular class (“is this photo a photo of a cat or a dog?”). Logistic regression is one of many learning algorithms for classification.
What we say: feature engineering
What we actually mean: massaging data so that it yields the most predictive power; generating variables from data that fit our understanding of the context we’re trying to model but are not found as-is in the raw data we are using.
What we say: hypothesis space
What we actually mean: a set of possible functions. It’s machine learning’s job to learn which of these possible functions best approximates the relationship between input and output.
What we say: hyperparameters
What we actually mean: a set of configuration values. Actually, we just wanted a cooler name for configuration. All models have some types of configuration variables, be that the depth of a tree-based classification model or the amount of clusters we want to find in clustering methods. We spend a lot of time fiddling around with hyperparameters, because they have a huge impact on training time and model accuracy.
What we say: ensemble
What we actually mean: a collection of models working together. More often than not, a single model will contain some faults that can be assisted by feeding its results to another one. This is more common in machine learning competitions than real life.
What we say: boosting
What we actually mean: the same as an ensemble, but with a single type of model fitted many times; say, by fitting many classification trees recursively, thus improving the accuracy of the ones before it.
What we say: matrix
What we actually mean: data consisting of nothing but numbers. Few models can work with non-numeric data — albeit even those will transform non-numeric data to numeric under the hood — so we mainly operate using matrices.
What we say: confusion matrix
What we actually mean: a summary of classification results. Classification can go wrong in many ways. You can predict someone is female when they are male, and vice-versa. You can also predict these things correctly. Confusion matrices show us how wrong we are in each different case.
What we say: continuous variable
What we actually mean: it’s a number. The height of a person is a continuous variable. The opposite of a continuous variable is a discrete variable (like, say, the output of a binary classifier, 0/1).
What we say: imputation
What we actually mean: replacing missing values. Learning algorithms don’t like data with missing values. To address this issue, we replace missing values with something fitting for the context. For height, it could be median height by gender.
What we say: target/response/label
What we actually mean: the thing we’re trying to predict. E.g. if we want to predict customer churn, then churn/non-churn are our targets.
What we say: training
What we actually mean: trying to make a machine learn something; taking a set of data and letting the computer find if there is any relation between the dataset’s features and the given target variable. Almost all machine learning entails some form of training.
What we say: cross validation
What we actually mean: assessing how well a model generalises to data it’s never seen before. If we train a model for predicting a person’s income, we can numerically validate how well it generalises by comparing it to labelled data it’s not seen during training.
What we say: supervised learning
What we actually mean: machine learning on fully labelled data. If we have data with the “answers” for the thing we’re trying to predict/classify, it’s easier to numerically validate how much we are messing up and how we should change our parameters to mess up less. That’s why supervised learning is popular. The downside is that fully labelled data is hard to get and/or laboursome to make.
What we say: unsupervised learning
What we actually mean: machine learning on data that isn’t labelled. Targets are what we train our models on, i.e. the things we want to predict. There are, however, lots of problems where where we don’t have, or indeed want, a fixed set of correct answers. We may, for example, want to group data (e.g. text) into different clusters (e.g. topics) without fixing the clusters beforehand.
What we say: non-parametric algorithm
What we actually mean: a learning algorithm where we don’t place restrictions on the number of parameters/weights learned functions can have . A non-parametric model may, in fact, have thousands or millions of parameters.
What we say: parametric algorithm
What we actually mean: a learning algorithm where we do place restrictions on the number of parameters/weights learned functions can have. Why these aren’t called “fixed-parameter algorithms” or something similar is beyond us.
What we say: vectorisation
What we actually mean: matrix/vector calculations. Training a machine learning model is a process of iteration. For/while loops are obvious choices for control flow in code, but for some calculations, it turns out you can achieve the same end result by using by grouping values into matrices /vectors and doing operations (addition, multiplication, and so on) on them. It’s usually much faster, which is the main reason we like vectorising stuff. If also makes our code incomprehensible jibberish, but we don’t really care.
What we say: credit assignment problem
What we actually mean: figuring out what should get the credit for some action. In reinforcement learning, where we try to learn the best action to take at any given moment, we may only get feedback on how well we did much later on. A classic example is chess, where we only get to know how we did (win/loss) when the game is over. Assigning credit to individual moves is difficult, which is why we gave it a name.
What we say: convolution
What we actually mean: Summing up element-wise dot products between matrices. To be exact, this isn’t even a mathematical convolution, but cross-correlation. We just skip some classical convolution operations because we don’t need them, making mathematicians angry in the process.
What we say: meta-learning
What we actually mean: learning how to learn. Finding the best configuration for a dataset/learning algorithm combination is a time-consuming process of trial and error. Since learning algorithms, erm, learn, why not have them learn how to learn optimally? Saves us the trouble of doing it ourselves.
What we say: deep learning
What we actually mean: neural networks with more than one hidden layer. More layers can capture more complex relationships between input and output. But deep learning sounds more sci-fi than “more than one hidden layer”, so we went with that instead.
What we say: Bayes error
What we actually mean: the irreducible error of a model. There’s usually some element of randomness in data generated by a given process, which means any model we train on data generated by that process will inevitably be wrong some of time.
What we say: innate prior
What we actually mean: enforcing common sense in learning algorithms. Typically done by designing the learning algorithm directly so that not doing the common sense thing is impossible.

Machine learning jargon, in plain English

Written by Max Pagels