Predictions for the Next Five Years in Machine Learning

Published in

The Hands-on Advisors

5 min readJan 3, 2020

It’s the beginning of a new year, and with it, the beginning of a new decade. 2010–2019 will go down as the time when machine learning became commonplace, but amidst claims that machine learning is starting to see diminishing returns, where do we go from here? What are the concepts, techniques and tools that will move the field forward in the next decade? Here are some of my predictions for the next five years.

1. Automated feature engineering will become the new norm

Applied machine learning is an exercise in feature engineering. Even though we have algorithms capable of feature learning that can also approximate any Borel-measurable function, explicit features are still the key to good generalisation. Feature engineering by hand is difficult: it requires time, domain expertise, precision, and a keen eye. Fortunately, frameworks designed specifically for making feature engineering easier have started to pop up (two great options: Featuretools & TSFresh). In one project in late 2019, we started with some pretty powerful embeddings and a good chunk of data, but ended up with a model whose accuracy was no better than a coin toss. We added automated feature engineering to the mix and saw an instant absolute increase in accuracy of around 25%. It’s not hard to envision a future in which data scientists design data schemas, add some domain-specific calculations and let machines do the rest of the feature engineering.

2. Contextual decision processes will make “full” reinforcement learning viable for real-world problems

At Fourkind, we do lots of immediate-reward reinforcement learning with contextual bandits. In this setting, a reward is assumed to be directly attributable to the action that preceeded it, which is a huge simplification that nevertheless works well when the reward function is carefully chosen. On the other end of the spectrum, we have “full” reinforcement learning, in which all states up to a certain point in time can influence the choice of optimal action. A practical problem with full RL is that, despite us living in a world of “big data”, the amount of data you need to do full RL successfully is very large. So large, in fact, that most companies simply don’t have the scale to make use of it.

Contextual decision processes (CDPs) promise to provide a happy medium between contextual bandits and full RL. Whereas immediate-reward RL focuses on extracting immediate value by choosing an action given an observation, and full RL maximum value over some long time horizon given states, CDPs attempt to optimise over time horizons given observations — theoretically requiring less data than MDP-based full RL in the process. Research in this area is quite recent, and I’m massively oversimplifying things, but the bottom line is this: real-world problems deal with observations, but full RL has it’s foundations in MDPs, which deal with state. For that reason, I suspect CDPs, or something similar, will prove popular in the future.

3. Attention-like mechanisms will be adapted to all popular learning algorithms

Attention mechanisms have proven to be massively useful, sometimes all but eliminating older techniques (e.g. recurrent neural networks). Without going into too much detail, attention mechanisms seek to provide learning algorithms a way to “attend”, or pay attention to, some specific part of an input, thought to be of use for making an accurate prediction. It’s not entirely unlike how humans quickly narrow down or “pre-filter” the information we receive before more in-depth consideration. This type of “concept” learning is notoriously difficult for machines, but attention mechanisms in neural networks have shown us that it can be done, with excellent results. I fully expect such attention mechanisms to find their way into other learning algorithms in the future. Despite all the hype, deep learning isn’t useful for everything — it would be a shame if advances in neural networks aren’t adapted to other algorithms, too.

4. Quality-Diversity algorithms will see widespread use in generative machine learning

Machine learning mainly focuses on taking some input, and finding a mapping that will yield the most accurate output. Evolutionary algorithms are similar in their search for optimality: they attempt to find the best-performing solution to any given problem. However, in many cases, finding several good solutions is preferable to finding just one. Take mazes, for example: the end goal is the same (find your way out), but many mazes have several possible, and equally viable, solutions.

QD algorithms aim to produce two things: a diverse set of solutions, each of which are of high quality. This has obvious implications in several fields, but I’m most excited about applications in generative machine learning, where breadth of output (for lack of a better term) is often just as important as the result. Research into QD algorithms is relatively nascent (see here for a good summary), but very exciting. Exciting enough that I think we’ll hear a lot about it in the near future.

5. Why?

Causality is the boogeyman of machine learning. It shouldn’t be. We all know correlation doesn’t equal causation, but ask a machine learning practitioner if a model is causally sound and they’ll whimper, hide under their desk and never give you a straight answer. One reason has to do with causality itself; the other has to do with fear of failure. Let’s start with the former.

Causality is funny in that nothing is, nor will ever be, causally “complete”. You cannot fully determine cause and effect simply because asking the question “why?” will inevitably lead to you asking “why?” in perpetuity. To do any causal estimation at all, we have to make some assumptions about the world, and some imperfect model thereafter. Make a mistake in your assumptions and your model will be wrong. Make a bad model and your conclusions will be wrong. Draw the wrong conclusions and you might end up making a multi-million dollar mistake.

Causality is a minefield, and because it is such a minefield, destroying the causal validity of a model is like taking a blowtorch to butter. It’s very easy to point out mistakes–so much so that much of the community has spent the past decade claiming moral superiority as opposed to providing constructive criticism and solutions. In my view, this attitude is an important reason why most machine learning thus far has focused on accurate predictions, not causal considerations. Attempting to incorporate some form of causality into models, or worse yet, also talking about it, has traditionally been asking for trouble. This despite the fact that many of the underlying mathematical constructs are the same and knowledge is transferable across domains.

Fortunately, attitudes have changed and I think we are seeing a more permissive discussion around machine learning and causality. As a field, we’re slowly acknowledging that arguing isn’t getting us anywhere, and seeing causality for what it is: something to strive for, incorporate, and discuss, knowing full well that we’ll never do a perfect job of any of it. Causality is becoming less of a taboo and more receptive to experimentation. And that’s why I predict it will have a bright future.