Top 2018 Machine Learning Trends (And Our 2019 Preview!)

Published in

the integrate.ai blog

10 min readDec 19, 2018

As 2018 comes to a close, we thought we’d share our thoughts on the most impactful developments in machine learning over the past year and preview what we’re excited about in 2019.

2018 was more a year of incremental shifts than outright transformations, but there were still notable developments: New twists on established neural architectures pushed state-of-the-art results across a variety of domains, natural language processing made some game-changing leaps, and AI ethics became an unavoidable part of our public discourse.

5 Developments from 2018

1. Self-attention changed the way we think about deep learning architectures

Attention is the mechanism neural networks use to determine what parts of the input are most pertinent to solving the task at hand. Simply put, it’s a probability distribution over the input that tells the model to “focus on this and not that.” Attention initially demonstrated its value in natural language processing by making machine translation more effective. If you’re trying to translate a French sentence into English, after all, it’s unlikely a good translation will end up with both sentences perfectly aligned word-for-word.

Attention has proven to be powerful, but the concept of self-attention may be even more promising. In standard attention, a probability distribution is used to figure out, for instance, which words in the original French sentence the model should focus on in order to generate the next word in the English translation. Self-attention applies a similar mechanism, but with all the focus now applied to the input sentence itself. In other words, for each word in the input sentence, the self-attention mechanism determines the most relevant words in this same sentence. The result is a representation of the sentence’s internal structure and dependencies.

An example of self-attention, in which each word attends to all the other words in the same sentence (darker lines indicate stronger attention connections). From the original Transformer network paper.

Very early examples of self-attention can be seen as far back as 2016 (it wasn’t called that back then). But it wasn’t until the end of 2017 that self-attention really started gaining broad visibility thanks to a paper introducing the Transformer network, a neural architecture that showed that self-attention could do the heavy lifting previously accomplished by CNNs and RNNs all on its own.

While the handful of self-attention applications in 2016 and 2017 mainly involved traditional NLP problems (e.g., machine translation and natural language inference), 2018 saw self-attention expand to a much broader range of domains. There was a self-attention generative adversarial network (GAN) for image generation; a self-attention approach for audio event recognition; and even a self-attention paper on human pose estimation. In 2018, self-attention proved its ability to push the state-of-the-art in a variety of domains, and we expect more of the same in 2019.

2. A chatbot misstep revealed some of the deeper issues still facing AI

Unlike the spectacular chatbot blunder of 2016, in which Microsoft’s Tay quickly adopted the most derogatory and hateful aspects of the Internet, this year’s unveiling of Google Duplex seemed tame by comparison. Duplex (technically a virtual assistant) wasn’t racist, cruel, or ill-mannered. In fact, it was very well-mannered, and that’s partly what unnerved people. The selling point of Duplex was that it could impersonate human conversations so well (at least conversations that involved basic tasks like making restaurant reservations), that the actual human being on the other end of the line couldn’t tell that they were conversing with an AI.

Even if the negative press may have been a bit overblown, the incident suggests that we’re beginning to move into a different and more complicated stage of AI, one that extends our more traditional conceptions of the uncanny valley. It’s one thing to see a humanoid that misses the mark by having just the wrong amount of human likeness (thereby producing an eerie sensation). It’s quite another thing to have what seemed like a totally normal human interaction only to find out later that you were the only actual human involved.

We’re still pretty far from having a population overrun by replicants, but the kind of behaviors that Turing envisioned are starting to become more plausible. More thought and research is sorely needed to figure out what we really want these interactions to be like and how clearly we want to demarcate the line separating the human from the artificial.

Our last AI in the 6ix featured an Oxford-style debate about gender representation in AI. Watch the debate to learn why the audience decided it was a net positive to represent AI as female.

3. Transfer learning and better word embeddings altered the landscape of NLP

2018 was a big year for natural language processing. Word embeddings (low dimensional vector representations of words) have been the first step in most NLP models for a while now. In 2018 though, these embeddings became a lot more powerful.

First there was ELMo (Embeddings from Language Models). While most previous approaches used shallow neural networks (one hidden layer) to produce embeddings, ELMo employed a deep neural network to create richer representations of words. Specifically, these representations encoded a lot more information about context, syntax, and semantics. Just incorporating them, without making any other substantial changes to existing pipelines, immediately provided an impressive performance boost.

Then came OpenAI GPT, which swapped ELMo’s LSTM architecture out and replaced it with a Transformer network (see above) again demonstrating the power of self-attention. The downside of OpenAI GPT, however, was that it trained its representations using a language model approach, meaning the training process was unidirectional (representations only looked forward to future words in the sentence, not backwards to previous ones). EnterBERT (Bidirectional Encoder Representations from Transformers). Like OpenAI GPT, BERT used a transformer architecture, but unlike OpenAI’s approach, the BERT architecture was bidirectional, leading to better overall results.

Comparing the architectures of the three most impactful word embedding approaches of 2018.

Word embeddings are essentially just a form of transfer learning: applying the weights trained from one neural network to a new model or a new application. While ELMo, OpenAI GPT, and BERT all limited themselves primarily to embeddings, ULM-FiT (Universal Language Model Fine-tuning for Text Classification) proved that a single NLP model could also be effective at transfer learning for a broader range of tasks.

“Universal” is a term that might be getting thrown around a bit too much these days in NLP. Still, more so than any other ML field, NLP research over this past year demonstrated a whole new frontier of model transferability.

4. AI fairness and transparency came under the spotlight

The Cambridge Analytica scandal was all over the news. GDPR came into effect. Fifty million Facebook users were affected by a security breach. Tesla’s autopilot function led to a fatal accident. Thousands of Google employees signed a petition opposing the company’s involvement in Project Maven (a Pentagon sponsored AI warfare project). Reports revealed that Amazon had been selling its facial recognition technology to police departments. The list goes on. Fairness and transparency stopped being mere concepts in 2018 and became actual (and acute) necessities.

The good news is that 2018 also produced a ton of great research into these domains. On the fairness side, there were papers on minimizing the risk of bias, on causal awareness, oncounterfactual approaches for fair text classification, and even a paper presenting a fairness GAN.

As regards transparency and interpretability, a number of papers took a closer look at what it really means for a model to be explainable, how humans might actually go about interpreting the explanations, and how the different roles of the interpreting agent can affect the overall results.

We at integrate.ai released a responsible AI framework in July that provides in-depth guidelines for how consumer enterprises can put ethical AI into practice.

With conferences like FAT* gaining increasingly visibility and institutes like AI Now expanding their reach, approaches to fairness and transparency are likely to continue to evolve. Thanks to a host of recent developments, the questions and concerns addressed by this research will only become more essential going forward.

5. Neural networks became increasingly meta

A few years ago, it started becoming apparent that there was one major weak link in the deep learning pipeline: people. Part of what has made deep learning so effective is that the old approach of manually selecting features was no longer necessary. Thanks to deep learning architectures, models could now figure out what aspects of the input were important all on their own. But humans still had to design the architectures in the first place. Or did we?

Neural architecture search is a sub-field of deep learning in which an ML algorithm (and potentially even another neural network) is used to determine the optimal neural network architecture for a specific task and data set. In some sense, one could see the whole history of deep learning as a gradual process of writing humans out of the loop (though, of course, we’re still the ones designing the algorithms that end up selecting the architectures, at least for now).

In 2018, the major development on this front was that Google Cloud released neural architecture search as a bonafide product. Google was basically giving companies the option to avoid hiring machine learning engineers and instead use a system that required limited ML expertise (though for a fee). While a handful of other companies were offering similar products at the time, Google’s entry into the space has undoubtedly spurred increased interest. Open-source libraries such as AutoKeras have since entered the fray, further lowering the bar for entry into the neural architecture search domain.

A slide from a Jeff Dean (of Google Brain) talk demonstrating the
iterative process for finding the optimal neural architecture.

At this stage, the effectiveness of these automated search approaches is still very much open to debate. There are a lot of things machine learning engineers bring to the table that can’t be replicated by algorithms that currently only solve one piece of the puzzle (though there have been attempts to automate the whole ML pipeline).

Since the machine learning field seems to follow the laws of natural selection for the most part, if neural architecture search does indeed prove more effective at selecting architectures than engineers, it’s safe to say that in the long run it’s likely to win out.

Two Trends We’ll Follow in 2019

1. Approaches to countering deceptive AI will become more robust

With the increasing popularity of generative adversarial networks (GANs), it’s a lot easier to trick human beings and even other deep learning models. In theory, GANs are only supposed to be deceptive within the context of the model itself. But part of what makes GANs so effective is that they stage a zero-sum game between two competing networks. One network — the generator — tries to “trick” the other network — the discriminator — such that it can’t tell the difference between real examples taken from the input data and the generator’s synthesized output. When the generator wins this game, the result is a model that’s really good at creating fake examples that appear to be real.

2018 saw the growth of deepfakes, the manipulation of audio and video recordings so that someone (usually a politician or celebrity) appears to say something they didn’t really say or do something they didn’t really do. As you might expect, GANs lie at the heart of the success of deepfakes. This success has a lot of people and organizations concerned, lawmakers and the United States Department of Defense among them.

Deepfakes may indeed suggest a potential crisis, but it’s worth noting that there are lots of other kinds of deception that are far more subtle. For instance, it’s been shown that making minor changes to images, in some cases even a single pixel, can cause models to completelymisclassify them. In an amusing bit of irony, recent research has demonstrated that GANs can actually be used to protect against these kinds of manipulations, countering attempts to trick classifiers. Hence, it turns out that GANs can also be homeopathic.

It goes without saying that deepfakes and other attempts at deceptive machine learning are likely to grow increasingly sophisticated in the coming years. However, it’s equally probable that approaches to countering these types of manipulations will also grow increasingly robust. We’re entering an age of black hat vs. white hat machine learning. Expect it to be both an exciting and precarious time.

Deep Reinforcement Learning will continue to expand beyond games and robots

Animated humanoids realistically donning a t-shirt and a jacket, respectively.

Over the last few years, deep reinforcement learning (DRL) has proven to be not only one of the more impressive subfields of machine learning, but also one of the more amusing. To name just a few examples, there was AlphaGo, a socially aware robot, an approach that matched human-level performance for multiplayer games, and even research into creating animations thatrealistically put on clothing.

While many of the other recent applications of DRL may not be quite so entertaining, they’re still managing to expand the reach of the approach in some pretty interesting ways. For instance, 2018 saw research on DRL for news recommendation, real-time advertising, and drug design. And that’s just a small fraction of the possible applications.

We expect the range of applications to further grow in 2019. Likewise, we remain optimistic in particular about the future of model-based reinforcement learning (in which an agent not only learns a policy but also learns a model of its environment). After all, this is a variety of RL that seems like a particularly convincing approximation of how we tend to navigate our own world.

Top 2018 Machine Learning Trends (And Our 2019 Preview!)

Written by integrate.ai