Machine Learning Reductions & Mother Algorithms, Part I: An Introduction

Max Pagels
Nov 5, 2018 · 4 min read

In the culinary field, there are five so-called “mother sauces”: bechamel, velouté, espagnole, tomato & hollandaise. According to this categorisation, any other sauce you make is, fundamentally, a derivation of one of more of the mother sauces. Take béarnaise, for example: it’s hollandaise with some additional flavouring (tarragon and shallots).

The five mother sauces. Image credit: Food52

In culinary education, learning to make mother sauces is the first step towards becoming a master saucier. The framework helps you to break down the complexity of making a new sauce by breaking down the process into manageable sub-problems, solving them, and composing the results. The same divide-and-conquer approach is used by humans so often that it’s hard to imagine a domain in which we don’t use it.

In the field of machine learning, we don’t have sauces, but we do have two learning algorithms so fundamental that I’d call them mother algorithms: simple regression and binary classification. As it turns out, lots of non-trivial machine learning problems can be broken down, or reduced, in a way that they are solvable using these mother algorithms. In this blog series, I’ll walk through some of the reductions I think practitioners should be aware of.

Two variants of ML “mother algorithms”: linear regression & logistic regression (for binary classification). Image credit: Machine Learning Plus

Reductions in machine learning have long been an interest of mine because I believe them to embody the ideal way we should approach problem-solving. When we tackle new machine learning problems, we have two options:

  1. Design new algorithms
  2. Figure out how to reuse existing algorithms

Much of science entails building on past, peer-reviewed research to answer increasingly difficult questions. This allows us to advance our knowledge rapidly in a way that remains scientifically sound. If we can find a way to solve a new ML problem by reusing existing algorithms, we’re not only saving time & making less mistakes: we’re adhering to good scientific practise. Reductions also help us understand how different techniques relate to each other, which in turn gives us more options to tackle new problems.

To solve an ML problem using reductions, the general approach is as follows:

  1. Identify the the learning problem L (usually either classification or regression)
  2. Find a premade reduction (or set of reductions) R that reduces your original data distribution D to an induced distribution Dᵢ solveable by a mother algorithm A
  3. Learn a predictor Fᵣ using A and data Dᵢ
  4. Rollup/compose the solution to the original problem using Fᵣ

Intuitively, what we want to do is transform our original problem, our dataset, or both in such a way that they are compatible with an existing mother algorithm, and use that as a solution. If the framework seems straightforward, that because it, well, is. There’s nothing especially complex going on here.

ML reductions are an active and relatively new area of research, with new results being published on a regular basis. One of the most prominent researchers in this space is John Langford, whose publications and website I highly recommend.

As it stands now, for machine learning practitioners, it’s good to know that it’s already possible to reduce at least the following:

  • Importance-weighted binary classification to binary classification
  • Regression to binary classification
  • Quantile regression to binary classification
  • Multiclass classification to binary classification
  • Cost-sensitive multiclass classification to importance-weighted binary classification
  • Cost-sensitive multiclass classification to regression
  • Ranking to binary classification
  • Contextual bandits (immediate reward reinforcement learning) to regression
  • Contextual bandits to binary classification
  • Reinforcement learning to binary classification

Some ML reductions are well-known; others aren’t immediately apparent. The composable nature of reductions is precisely what makes them powerful: Many problems are solvable using a reduction, but we can solve even more complex problems by stacking reductions one after the other. In this series, we’ll cover some of the most useful reductions, and touch upon other nifty tricks besides (hashing & the one classifier/regressor trick).

Part II: Multiclass to Binary Classification

A special thanks to John Langford and the “The Reduction Approach to Machine Learning” course he held at the University of Helsinki back in 2006, for sparking my interest in this field of ML.

The Hands-on Advisors

Sights and sounds from the Hands-on Advisors at Fourkind.

The Hands-on Advisors

Sights and sounds from the Hands-on Advisors at Fourkind. Expert services in strategy consulting & hands-on value creation through business & service design, engineering & data science. To tackle the hardest problems of tomorrow, today.

Max Pagels

Written by

Machine Learning Partner at Fourkind

The Hands-on Advisors

Sights and sounds from the Hands-on Advisors at Fourkind. Expert services in strategy consulting & hands-on value creation through business & service design, engineering & data science. To tackle the hardest problems of tomorrow, today.

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store