20% Accuracy Bump in Text Classification with ME-ULMFiT

Published in

Novetta

5 min readMar 31, 2020

By incorporating metadata into the text before training, we see as much as a 20% increase in accuracy

Introduction

In a previous post, we discussed an enhanced method of passing additional data to the ULMFiT¹ algorithm for unstructured text classification, which improved accuracy by up to 9%. In follow-on experiments, we have demonstrated that this approach — in which we prepend metadata to the text to provide more information to the model — increased accuracy by up to 20%, in relative terms. This improvement required just a few lines of Python, such that the performance boost comes at essentially no cost.

This blog post explains the findings of this more-detailed evaluation, where we have compared how the following affect accuracy:

Each individual piece of metadata
Different combinations of metadata
Different ways of passing the metadata to the model

For the evaluation, we worked with six Novetta Mission Analytics datasets, described below. The objective is to assign metadata tags to quotes, using granular categories called Submessages.

Some highlights from our results include:

Source-related metadata tags consistently performed the best, confirming our original hypothesis. Sources (e.g. UN Official) are the metadata most closely associated with the quotes we are tagging, so it makes sense that sources add the most information.
Including media name (e.g. New York Times) either decreased accuracy or showed little improvement. We found this surprising as we had assumed that including metadata would only improve accuracy.
Adding more metadata tags to the model did not always improve performance.
For five of the six datasets, using metadata tags that distinguish between columns in our data performed better than the approach (currently suggested by the fast.ai library) in which each field was demarcated by xxfld.

Dataset Overview

We conducted evaluations on the six datasets listed in Table 1. For each dataset, we made the following splits time-based between training and validation, where later data is held out for validation. We used 80% of the data for training and 17% for validation on the language model. Using that 97% of the data, we then trained the classifier, reserving the last 3% for the final validation.

Table 1: Characteristics of Datasets Used in ME-ULMFiT Evaluation

Experiment 1: Individual Metadata

Our first experiment evaluated how individual pieces of metadata impacted performance of the model. Working with six types of metadata (Author Name, Media Name, Media Nation, Source Name, Source Nation, Source Type), we created a model that prepended only that piece of information to the quote text. For example, a Source Name of “Steffen Seibert” prepended to a quote results in:

source_name source_name_Steffen_Seibert quote_text that “we have started down the path” towards the NATO goal.

The results of Experiment 1 are shown in the following heat map:

Figure 1: Heat map showing relative change in accuracy of ME-ULMFiT with individual types of metadata compared to the baseline ULMFiT model with no metadata

From the heat map, we made the following observations:

Source-related metadata tags generally performed the best
The Media Name metadata tag performed the worst for four out of six datasets
Datasets with fewer Submessage labels saw less improvement from the inclusion of metadata than those with more Submessage labels

Experiment 2: Combinations of Metadata

In the second experiment, we evaluated how different combinations of metadata impacted model accuracy. Our intuition suggested that more metadata would always lead to better performance, but as shown in Figure 2, this was not the case. We will be conducting further investigation into why.

We examined five different combinations of metadata as follows:

The top-2 performing metadata tags from Experiment 1
The top-3 performing metadata tags from Experiment 1
The top-4 performing metadata tags from Experiment 1
All metadata tags
Fast.ai built-in tags, which utilized all metadata but whose fields are separated by xxfld instead of column-specific tags. In this method, values within the data are not concatenated (e.g., The New York Times was not converted to xxfld_The_New_York_Times)

Figure 2: Heat map showing relative change in accuracy of ME-ULMFiT with combinations of metadata compared to the baseline ULMFiT model with no metadata

While we see that more often than not multiple tags improve performance over the best baseline, that is not always the case. For only one dataset did the model with all metadata perform the best and it generally underperformed the most accurate individual metadata model. We believe this has to do with the way batches are created within the model, so we will explore further to see why more data is not always improving accuracy.

Experiment 3: Methods of Passing Metadata

Lastly, for the models that included all metadata, we wanted to compare different methods of passing in metadata. Our original post describes our method of using separate tags for each metadata field and concatenating all values of metadata. The alternative is the method introduced by fast.ai, which separates each field with xxfld. A comparison of two example quotes is found below:

ME-ULMFiT method

author_name author_name_Pressse_Agence media_name media_name_Hurriyet media_nation media_nation_Turkey source_name source_name_Steffen_Seibert source_nation source_nation_Germany source_type source_type_Germany_Officials quote_text that “we have started down the path” towards the NATO goal.

Fast.ai method

xxfld Pressse Agence xxfld Hurriyet xxfld Turkey xxfld Steffen Seibert xxfld Germany xxdld Germany Officials quote_text “we have started down the path” towards the NATO goal.

Results from these models are shown above in Figure 2.

In all but one case, the ME-ULMFiT method of adding metadata to the model outperformed the method suggested by fast.ai. We suspect that this is due in part to the uniqueness of the metadata tags and associated metadata. By treating metadata as a single, distinct string of text, the model will not try to apply learned patterns from quote text. Instead, metadata text is treated as a separate token to be learned by the model.

Conclusion

Through our experiments, we demonstrated that including metadata along with the text of a quote led to increases in accuracy in most cases, and to significant increases in accuracy in some cases. This is powerful since this improvement is nearly free when relevant metadata is available to be associated with the text. We are exploring additional improvements to ME-ULMFiT that may lead to further performance gains.

[1] Howard, Jeremy, and Sebastian Ruder. Universal Language Model Fine-Tuning for Text Classification. 2018, Universal Language Model Fine-Tuning for Text Classification, http://arxiv.org/pdf/1801.06146.pdf