7 Stories Behind the World’s Most Popular Machine Learning Algorithms

Sithan Kanna
GAMMA — Part of BCG X
7 min readSep 13, 2018

The world of AI/Machine Learning is evolving fast. If you’re like me, keeping up with the latest developments can seem like trying to reach a destination while walking on a treadmill. Sometimes it’s worth it to just step off, pause, and look back at the origins of many of the paradigms and algorithms that got us to where AI is today.

We are very fortunate that many but, sadly, not all, of the inventors who shaped AI are alive today. It can be inspiring (if sometimes intimidating) to hear about pivotal moments in the field from the very people who made them so significant. To that end, I’ve included the following seven videos (taken from past interviews and talks) because of what we can learn from these luminaries of our profession.

Together, the videos shine a light on the history of these algorithms, particularly on specific problems these researchers were trying to solve. Their solutions eventually led to the invention of the algorithms themselves. This glance at the past provides a deeper understanding of the methods and suitability of the algorithms for different applications.

The videos also give us a glimpse into the thought processes behind these inventions. An understanding of these mental processes might, in turn, help us apply these similar processes to solve the problems our field currently faces.

Finally, the videos provide an entertaining history of the development of the algorithms, analogous to the way the “origin stories” in comic books help readers understand the “back story” of popular heroes and heroines.

The Seven Stories

1. Decision Trees

The late Leo Breiman was instrumental in developing several tree-based methods. In this rare video snippet, he talks about the problem of classifying ships from radar signals. Breiman explains how he covered the walls of his office with data from the radars prior to his “aha” moment on Decision Trees. His dedication to penetrating the fog of data is a lesson to us all.

“The one thing I’ve learnt in my consulting, is that you have to get to know your data”

— Leo Breiman

Takeaway #1: Investing time to explore your underlying data source will pay off — even if you don’t see immediate results.

Next time, your office looks like this, blame Leo.

2. Bootstrap

Ideas for algorithms can arise when a constraint (often an implicit one) is removed. In the next interview, Bradley Efron explains how he arrived at the idea for the Bootstrap from an earlier re-sampling method called the “Jackknife.”

“I had written one-line down on a piece of paper which was ‘What is the jackknife an approximation to?’ and that got me going on it”

— Brad Efron

Takeaway #2: Remind yourself of the bigger objective. Identify any implicit constraints that are hindering your thought process.

3. Random Forest

Making the proper trade-offs between the complexity and the performance of an algorithm can be quite challenging. The next video snippet serves to remind us all about the importance of considering this trade-off. In this video Adele Cutler, co-inventor of the modern Random Forest, recalls discussions she had with Leo Breiman about the importance of retaining the simplicity of tuning the Random Forest.

This story indicates how important it can be to create models that are easily used by others. It shows us that it’s no accident Random Forests have become one of the main de facto standard algorithms used in machine learning.

Takeaway #3: Always keep the end users of your algorithm in mind. Think about what you could do to make their lives easier.

4. Gradient Boosting

Boosting, now a well-known concept, wasn’t readily accepted by the statistical community. Even Jerome H. Friedman, a pioneer of the gradient-boosting algorithm, admits that he did not appreciate the novelty and usefulness of the concept of Boosting.

“It was a very good idea that I didn’t immediately appreciate… but it was a very good idea”

— Jerome H. Friedman

Friedman later made his contribution to the algorithm by generalizing it to other cost functions and problem settings. His actions resulted in him becoming a co-inventor of the algorithm.

Takeaway #4: Seek to generalize your solution to a wider class of problems. Make it easy for others to use it.

5. Artificial Neural Networks

Generalizing a problem and/or solution can result in many benefits. On the other hand, you can also generate useful results by finding a special case of a general solution.

In the next video, Bernard Widrow, co-inventor of the adaptive linear neuron (ADALINE), discusses how his team used a special case of configuring the ADALINE in adaptive antennae systems in telecommunications. As a result, they found an application outside of pattern recognition.

Takeaway #5: Just as you can benefit by generalizing from a specific problem, you can also benefit by the opposite: finding a specific case of a general solution. A special case or adjacent application could have an immediate application that leads to a large impact.

6. Support Vector Machines (SVM)

SVM, which was the go-to method for classification problems before the popularity of Deep Neural Networks, has its own rich and colorful history. In this video, Vladimir Vapnik discusses how his early years in Moscow helped shaped the origins of statistical learning theories, including the ideas behind the SVM. His collaborator, Isabelle Guyon, also discusses the Kernel Trick, which generalized the SVMs for non-linear decision boundaries (hence making them more popular).

“The invention of SVMs happened when Bernhard decided to implement Vladimir’s algorithm in the three months we had left before we moved to Berkeley. After some initial success of the linear algorithm, Vladimir suggested introducing products of features. I proposed to rather use the kernel trick of the ‘potential function’ algorithm. Vladimir initially resisted the idea because the inventors of the ‘potential functions’ algorithm (Aizerman, Braverman, and Rozonoer) were from a competing team of his institute back in the 1960’s in Russia!”

— Isabelle Guyon

The rivalry between SVM and neural networks has been going on for a long time. Yann Le Cun is a deep-learning pioneer and inventor of convolutional neural networks. He revealed in this video that Larry Jackel and Vladimir Vapnik had, at the time, made a couple of bets on which algorithm would become more popular.

“Convolutional Nets and SVMs were developed within a few years of each other (between 1988 and 1992) in the Adaptive Systems Research Department at Bell Labs in Holmdel, NJ. Larry Jackel was the head of the department whose research staff included Vladmir Vapnik and me, along with Bernhardt Boser, Léon Bottou, John Denker, Hans-Peter Graf, Isabelle Guyon, Patrice Simard, and Sara Solla.

In 1995, Vladimir Vapnik and Larry Jackel made two bets (I was the witness, though admittedly not and entirely impartial one).
In the first bet, Larry claimed that by 2000 we will have a theoretical understanding of why big neural nets work well (in the form of a bound similar to what we have for SVMs). He lost.

In the second bet, Vladimir Vapnik claimed that by 2000 no one in their right mind would use neural nets of the type we had in 1995 (he claimed that everyone would be using SVM). Not only Vladimir lost that one in 2000, but recent deployments of neural nets by Google and Microsoft are proving him wrong in 2012"

— Yann Le Cun

Also note Isabelle Guyon’s idea of generalizing Vanpink’s suggestion of using products of features to incorporate a wider range of non-linear transformations using the Kernel trick. This approach echoes Takeaway #4 from Jerome Friedman’s interview: Generalize your solution to a wider class of problems.

Takeaway #6: The role of collaboration in developing and solving the problem at hand cannot be overstated. Take time to collaborate. You will move more quickly when you do.

7. Deep Learning

We end this historical study with a video on the “darling” of the machine learning community: deep learning. Although Yann Le Cun has given the first part of this same talk in several keynotes, the following segment provides a sound general introduction to Deep Learning, covering much of its history in the process. The interesting part comes after min 44:06 when Le Cun discusses open problems in AI.

“If you read the media …, you think the [AI] problem is solved. It is not solved at all. You have very useful applications coming out AI/ML, but it is not solved”

— Yann Le Cun

An ability “look over the horizon” is a trait shared by many of the researchers featured in this article as they continue to look for the next frontier in learning.

  • Brad Effron is currently working on using bootstrapping to tease out connections between Bayesian and Frequentist statistics.
  • Bernard Widrow is investigating the role of memory in neural networks.
  • Vladimir Vapnik is investigating the idea of improving learning using “expert instruction”.
  • Yann Le Cun continues to bring excitement and innovative work to the study of adversarial training.

Takeaway #7: Always search for the next frontier in your field. Consider the impact of current and approaching issues.

We hope this article has given you a new perspective on the nature of the algorithms listed here. If you know of videos or interviews that will add to this discussion, please post them in the comments section.

--

--

Sithan Kanna
GAMMA — Part of BCG X

Senior Data Scientist at BCG Gamma | PhD in Adaptive Signal Processing