Introducing Metadata Enhanced ULMFiT

Matthew Teschke
Novetta
Published in
6 min readMar 19, 2020

Note: Originally posted March 26, 2019

Introduction

Since its release in 2018, Novetta’s Machine Learning Center of Excellence has evaluated the performance of Universal Language Model Fine-Tuning (ULMFiT)¹ against many customer use cases. We have found that automated labeling of unstructured text helps analysts be more efficient, freeing them up to focus on higher-value tasks. We have been so impressed with its accuracy that we have deployed ULMFiT against a dataset in a production system, with plans to expand it to dozens of other datasets.

While ULMFiT’s performance has been impressive on all the datasets we’ve tested, we thought it could do more. Our first use case, in coordination with the Novetta Mission Analytics team, was to automate the tagging of quotes from news articles with customer-specific labels. As we began building models with ULMFiT, we hypothesized that the use of metadata could lead to more accurate predictions. For the Novetta Mission Analytics team, metadata refers to additional information about the quote and article, such as publication, country, and source. Experimentally, we found that including article metadata improved the relative performance of our models by 3–9%. Below, we discuss this new capability — Metadata Enhanced ULMFiT (ME-ULMFiT).

Novetta Mission Analytics application — users can view trends of quotes by category over time

Key Breakthrough

When we first conceived ME-ULMFiT in September 2018, we considered a couple of implementation approaches. Our first idea was to combine a structured data model with the text model from fast.ai. Later, when thinking about Jeremy Howard’s “Introduction of Language Modeling”² in the 2018 course, we remembered his example of generating technical abstracts for papers. He had special flags that indicated the two sections of the abstract, <cat>, which indicated the category and <summ>, which was the text of the abstract. We realized that you might be able to pass the model information in a similar fashion — by prepending the article metadata to the text of the quote we are trying to label. Special tokens would serve as the indicator to let the model know that a string was article metadata and not the text of the quote. This was especially appealing because it meant zero changes to the underlying fast.ai code — the work could be done through a few lines of python that would preprocess the data.

As an example, say we want to classify the following quote into one of two topics, Computer Science or Machine Learning.

Today we’ll see how to read a much larger dataset — one which may not even fit in the RAM on your machine!

Without the context from the original article, it would be difficult to classify this quote. However, if we provide the model with the author (Jeremy Howard) and publication (fast.ai), then the model would be much more confident that this quote should be attributed to the Machine Learningtopic. To accomplish that, we would preprocess the quote as follows:

In this case, our special tokens are as follows:

  • pub_name, indicating that the publication will follow;
  • aut_name, indicating the author will follow next; and
  • quote_text, indicating that the quote to be classified will follow.

Having figured out how to pass the article metadata to the model, it’s time to check our intuition and see if this helped model performance.

Evaluation Methodology

We tested ME-ULMFiT on multiple datasets provided by our Novetta Mission Analytics team, summarized in Table 1:

Table 1: Dataset Statistics

For each dataset, we tested ME-ULMFiT with targets of Primary Message and Submessage, with and without metadata. Primary Message is a broad category specific to a customer’s mission, while Submessages are fine-grained messages within each Primary Message. Classification performance is evaluated on the latest (by publication date) 3% of articles; for additional details on how we split the data, see our previous post.

Our baseline is a ULMFiT model trained solely on the text of the quote. The ME-ULMFiT versions were all created by prepending the quote with the relevant tag indicators (e.g., pub_name) and the metadata value. Article metadata evaluated included:

  • publication name (e.g. New York Times or BBC)
  • publication affiliation (e.g. Independent or State Media)
  • publication nationality (e.g. Russia or United States)
  • source name (e.g. Unnamed Russian Media or Joe Biden)
  • source nationality (e.g. Russia or United States)

Results

As hypothesized, we observed an improvement in models that included metadata. Relative improvements ranged from 3–9%. Results for the Europe and Middle East datasets are shown below in Tables 2–5, separated into Primary and Submessage performance.

Table 2 shows performance on the 8 Primary Messages for the Europe data:

Table 2: Comparison of ULMFiT and ME-ULMFiT on Europe Data (Primary Messages)

Table 3 shows performance on the 121 Submessages for the Europe data.

Table 3: Comparison of ULMFiT and ME-ULMFiT on Europe Data (Submessages)

Table 4 shows performance on the 8 Primary Messages for the Middle East data.

Table 4: Comparison of ULMFiT and ME-ULMFiT on Middle East Data (Primary Messages)

Table 5 shows performance on the 75 Submessages for the Middle East data.

Table 5: Comparison of ULMFiT and ME-ULMFiT on Middle East Data (Submessages)

Analysis of Results

Our first attempt at incorporating metadata resulted in improved prediction accuracy for the Europe and Middle East datasets, as well as for Primary Messages and Submessages. Given the volume of quotes that Novetta Mission Analytics analysts tag annually, a 3–9% improvement in accuracy will represent a significant operational impact.

While further optimizations are likely to improve performance, results to date are impressive considering that implementation only requires the addition of a few lines of Python to our existing code base. This means that our Novetta Mission Analytics product team will be able to deploy the improved models to production in a matter of hours, most of which is compute time for the new models to train. This speaks to the power and flexibility of ULMFiT implemented with fast.ai.

Next Steps

While we have been testing ME-ULMFiT, members of the fast.ai community have developed approaches to combining models of different types into a single model (in line with our original idea for implementing ME-ULMFiT). Active work in this area includes contributions from Radek Osmulski (@radekosmulski), who provided an example for the Kaggle Quickdraw competition that combined multiple models into one (see his MixedInputModel class). Further, in this fast.ai forum discussion, Jose Fernandez Portal details performance improvements based on combining structured data and text data.

We plan to explore the combination of different data types such as structured data, images, and unstructured text into a single model, and we are excited for the opportunity to share our results with the fast.ai community.

[1] Howard, Jeremy, and Sebastian Ruder. Universal Language Model Fine-Tuning for Text Classification. 2018, Universal Language Model Fine-Tuning for Text Classification, http://arxiv.org/pdf/1801.06146.pdf

[2] The topic I was reminded of was in the 2018 version of the course. If you are interested in watching the video, it is lesson 4 at the 1 hour 24 minute mark: http://course18.fast.ai/lessons/lesson4.html

<link rel=”canonical” href=”https://www.novetta.com/2019/03/introducing_me_ulmfit/” />

--

--