Microsoft MT-DNN Surpasses Human Baselines on GLUE Benchmark Score

Jun 18, 2019 · 3 min read

Microsoft recently updated the performance of their Multi-Task Deep Neural Network (MT-DNN) ensemble model. The significant performance boost has the model sitting comfortably atop the benchmark GLUE benchmark rankings, outperforming human baselines for the first time in terms of macro-average score (87.6 vs.85.1).

GLUE (General Language Understanding Evaluation) is a multi-task benchmark and analysis platform for natural language understanding. MT-DNN was improved in early April and climbed to second place on the GLUE leaderboard, trailing only human baselines. Two months later the MT-DNN-ensemble is back with a huge performance improvement on the WNLI task, scoring 20 points higher than most previous methods as shown in the following table.

GLUE LeaderBoard

WNLI is a reading comprehension task that uses sentences containing a pronoun and a list of possible referents to test a model’s ability to solve pronoun disambiguation problems. It is one of the harder tasks ranked on GLUE. Even educated people can struggle on particulars in the WNLI test, and so the high accuracy of the MT-DNN-ensemble model came as a pleasant surprise. Microsoft researchers didn’t reveal many details on the multi-model integration of MT-DNN, which draws on the paper Multi-Task Deep Neural Networks for Natural Language Understanding.

A fundamental part of natural language understanding is language embedding learning — the process of mapping symbolic natural language text to semantic vector representations. This is what Multi-Task Deep Neural Network models attempt to do — learn universal language embedding.

MT-DNN-ensemble is a multi-task learning method, which means all tasks share the same structure although the objective function of each task is different. It is also a combination of multi-task learning and language model pre-training. The MT-DNN model’s architecture is shown below.

Architecture of the MT-DNN model for representation learning

The lower layers are shared across all tasks. The input X is first represented as a sequence of embedding vectors for each token in l_1, then the transformer-based encoder generates shared contextual embedding vectors in l_2. Finally, the top layers are task-specific, thus suitable for learning features that best fit a specific task. Similar to the BERT model, MT-DNN is trained in two phases: pre-training and fine-tuning. Unlike BERT, MT-DNN uses multi-task learning during the fine-tuning phase and has multiple task-specific layers in its model architecture. The experiment results are shown below.

GLUE test set results

The MT-DNN model had achieved similar results to the BERT model on WNLI after its April boost, with accuracy of 65.1 percent. The new improvement to 89 percent accuracy has many in the ML community waiting anxiously for Microsoft to reveal the secret of its success.

The associated paper Improving Multi-Task Deep Neural Networks via Knowledge Distillation for Natural Language Understanding is on arXiv.

Author: Hecate He | Editor: Michael Sarazen

2018 Fortune Global 500 Public Company AI Adaptivity Report is out!
Purchase a Kindle-formatted report on Amazon.
Apply for Insight Partner Program to get a complimentary full PDF report.

Follow us on Twitter @Synced_Global for daily AI news!

We know you don’t want to miss any stories. Subscribe to our popular Synced Global AI Weekly to get weekly AI updates.


We produce professional, authoritative, and…


We produce professional, authoritative, and thought-provoking content relating to artificial intelligence, machine intelligence, emerging technologies and industrial insights.


Written by


AI Technology & Industry Review — | Newsletter: | Share My Research | Twitter: @Synced_Global


We produce professional, authoritative, and thought-provoking content relating to artificial intelligence, machine intelligence, emerging technologies and industrial insights.

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store