Machine Learning Reductions & Mother Algorithms, Part I: An Introduction
In the culinary field, there are five so-called “mother sauces”: bechamel, velouté, espagnole, tomato & hollandaise. According to this categorisation, any other sauce you make is, fundamentally, a derivation of one of more of the mother sauces. Take béarnaise, for example: it’s hollandaise with some additional flavouring (tarragon and shallots).
In culinary education, learning to make mother sauces is the first step towards becoming a master saucier. The framework helps you to break down the complexity of making a new sauce by breaking down the process into manageable sub-problems, solving them, and composing the results. The same divide-and-conquer approach is used by humans so often that it’s hard to imagine a domain in which we don’t use it.
In the field of machine learning, we don’t have sauces, but we do have two learning algorithms so fundamental that I’d call them mother algorithms: simple regression and binary classification. As it turns out, lots of non-trivial machine learning problems can be broken down, or reduced, in a way that they are solvable using these mother algorithms. In this blog series, I’ll walk through some of the reductions I think practitioners should be aware of.
Why would you want to use ML reductions?
Reductions in machine learning have long been an interest of mine because I believe them to embody the ideal way we should approach problem-solving. When we tackle new machine learning problems, we have two options:
- Design new algorithms
- Figure out how to reuse existing algorithms
Much of science entails building on past, peer-reviewed research to answer increasingly difficult questions. This allows us to advance our knowledge rapidly in a way that remains scientifically sound. If we can find a way to solve a new ML problem by reusing existing algorithms, we’re not only saving time & making less mistakes: we’re adhering to good scientific practise. Reductions also help us understand how different techniques relate to each other, which in turn gives us more options to tackle new problems.
How ML reductions work
To solve an ML problem using reductions, the general approach is as follows:
- Identify the the learning problem L (usually either classification or regression)
- Find a premade reduction (or set of reductions) R that reduces your original data distribution D to an induced distribution Dᵢ solveable by a mother algorithm A
- Learn a predictor Fᵣ using A and data Dᵢ
- Rollup/compose the solution to the original problem using Fᵣ
Intuitively, what we want to do is transform our original problem, our dataset, or both in such a way that they are compatible with an existing mother algorithm, and use that as a solution. If the framework seems straightforward, that because it, well, is. There’s nothing especially complex going on here.
What kind of ML reductions are possible?
ML reductions are an active and relatively new area of research, with new results being published on a regular basis. One of the most prominent researchers in this space is John Langford, whose publications and website I highly recommend.
As it stands now, for machine learning practitioners, it’s good to know that it’s already possible to reduce at least the following:
- Importance-weighted binary classification to binary classification
- Regression to binary classification
- Quantile regression to binary classification
- Multiclass classification to binary classification
- Cost-sensitive multiclass classification to importance-weighted binary classification
- Cost-sensitive multiclass classification to regression
- Ranking to binary classification
- Contextual bandits (immediate reward reinforcement learning) to regression
- Contextual bandits to binary classification
- Reinforcement learning to binary classification
Some ML reductions are well-known; others aren’t immediately apparent. The composable nature of reductions is precisely what makes them powerful: Many problems are solvable using a reduction, but we can solve even more complex problems by stacking reductions one after the other. In this series, we’ll cover some of the most useful reductions, and touch upon other nifty tricks besides (hashing & the one classifier/regressor trick).
Part II: Multiclass to Binary Classification
A special thanks to John Langford and the “The Reduction Approach to Machine Learning” course he held at the University of Helsinki back in 2006, for sparking my interest in this field of ML.