Using gobbli for interactive NLP

Jason Nance
RTI Center for Data Science and AI
8 min readMar 17, 2020

About 6 months ago, we released gobbli, an open-source Python library that provides a uniform interface to cutting-edge deep learning models for text. We published an initial blog post describing its capabilities with some examples. Since then, we’ve been using the library for internal projects and brainstorming ways to make it better.

We identified some common tasks in our text analysis workflows related to data exploration, model evaluation, and model interpretation. We also discovered Streamlit, a framework for rapidly building interactive data applications. The combination of gobbli and Streamlit seemed like a natural foundation to build interactive apps for NLP.

This post will highlight some of the new features available in gobbli 0.1.0, including the interactive apps. To follow along, you’ll need to install gobbli with the extra dependencies for its interactive apps. Full instructions are available in the docs, but in brief, you need to install Docker and create a Python 3.7+ environment. Then:

pip install gobbli[interactive]

gobbli now ships with a command line interface which wraps the bundled Streamlit apps. You can see the available apps by running the following in your shell:

gobbli --help

explore

The explore app requires a dataset. The dataset can be one of a few formats (note it must fit in memory):

  • A built-in gobbli dataset (ex. NewsgroupsDataset or IMDBDataset)
  • A text file with one document per line
  • A .csv file with a “text” column and optional “label” column
  • A .tsv file with a “text” column and optional “label” column

Some functionality won’t appear for datasets without labels. If you don’t have your own dataset handy, the following invocation will work out of the box (note it will take several minutes to build the first time, as gobbli has to download and unpack the dataset):

gobbli explore NewsgroupsDataset

If everything is installed correctly, you should see the explore app open in your browser.

The explore app pointed at the built-in IMDB dataset.

Here are some general things to know about the gobbli interactive apps:

  • Parameters and user input are kept in the sidebar. The main section is reserved for displaying data and output.
  • Since the entire app re-runs with every input widget change, the apps default to taking a small sample of data so you can tweak parameters without locking up your browser on long-running tasks. You can increase the sample size when you have everything set the way you want.
  • All the normal gobbli output goes to the terminal window running Streamlit. Check the terminal to see status of long-running tasks that involve use of a model (embedding generation, prediction, etc).
  • We attempt to cache long-running task results as much as possible, but re-running costly tasks is required in many cases when parameters change.

Upon opening the app, you’ll be able to read through example documents from the dataset and check the distributions of labels and document lengths. The more involved tasks of topic modeling and embedding generation require some additional inputs.

Topic modeling

The explore app provides an interface to gensim’s LDA model, which allows you to train a topic model that learns latent topics from a bag-of-words representation of your documents. The approach doesn’t incorporate contextual information like a modern neural network, but it can reveal recurring themes in your dataset. To train a topic model, check the “Enable Topic Model” box in the sidebar and click “Train Topic Model”.

Results from a topic model.

The explore app displays the coherence score and top 20 words for each learned topic. It also displays the correlation between topics, which helps determine how well-fit the model is, and the correlation between topics and labels, which may help interpret some of the topics.

Plotting embeddings

Embeddings represent the hidden state of a neural network. They generally aim to quantify the semantics of a document, meaning documents with similar meanings are close together in the embedding space, so plotting them can provide a useful “map” of your dataset. gobbli makes this easy. To generate and plot embeddings, check the “Enable Embeddings” check box and click the “Generate Embeddings” button.

Results from plotting embeddings.

After some time, you’ll see the embeddings with their dimensionality reduced via UMAP. You can hover over individual points to see the text and label for that document. Points are colored by label.

Untrained embeddings can preview how well a model differentiates between the classes in your dataset. The more separated your classes are in the embeddings plot, the more likely the model will be able to discern the difference between them. Using the “Model Class” dropdown and “Model Parameters” JSON input, you can quickly evaluate different model types and parameter combinations on your dataset.

If you have a trained gobbli model, you can also visualize its embeddings (if it supports embeddings). You’ll need the path returned by calling “.data_dir()” on the model if you trained a model directly:

If you trained the model using a (non-distributed) experiment, you’ll need the path two directories up from the checkpoint:

Pass this path to the explore app to use a trained model:

gobbli explore --model-data-dir <MODEL_DATA_DIR> <DATASET>

You should then see the available checkpoints for the model in the “Embedding” section:

Generating embeddings using a trained gobbli model.

You can also apply clustering algorithms (HDBSCAN or K-means) to the embeddings before or after dimensionality reduction and plot the clusters, if you’re interested in seeing how well a clustering algorithm groups your documents in a high-dimensional or low-dimensional space. Check the “Cluster Embeddings” box, set parameters, and click “Generate Embeddings” again to see clusters plotted.

evaluate

The evaluate app displays evaluation metrics for a trained gobbli model applied to a given dataset. To use it, you need a dataset in any of the formats described above and the data directory of a trained model as obtained in one of the ways described above:

gobbli evaluate <MODEL_DATA_DIR> <DATASET>

This should open the evaluate app in your browser.

The evaluate app.

After loading and generating predictions using the passed model, the app displays the following:

  • metadata (parameters) for the model
  • standard metrics calculated from the model’s performance on the sampled dataset
  • a plot of the predicted probability for every observation in the sample for each class
  • a small set of example predictions, including the model’s most highly predicted classes and the true class for each
  • the top errors (false positives and false negatives) in the sample by predicted probability, allowing you to see which documents are most confusing to your model

These tools allow you to inspect both the overall and fine-grained performance of your model and potentially determine ways to improve its performance on troublesome documents.

explain

Finally, the explain app allows you to generate local explanations for individual documents using the ELI5 package’s implementation of LIME. These explanations can be useful for understanding why a model generates a certain prediction. Just like the evaluate app, the explain app requires a trained gobbli model’s data directory and a dataset:

gobbli evaluate <MODEL_DATA_DIR> <DATASET>

You’ll see this when the explain app launches in your browser:

The explain app.

The interface allows you to choose a single document and shows its full text and true label. If you check “Generate LIME explanation” and click the “Run” button, the app will train a white-box estimator to approximate your trained model’s behavior for documents similar to the chosen example. After the white-box estimator is trained, you’ll see some output:

LIME output, including evaluation metrics and per-label feature contributions

The JSON output shows the evaluation metrics directly from LIME. See the ELI5 tutorial for more details, but the gist is that mean KL divergence should be close to 0, and the score should be close to 1 for a good approximation. If these conditions aren’t met, the white-box classifier likely doesn’t match your original model well, and the explanation shouldn’t be trusted. You can try raising the number of generated samples to get a better-performing white box classifier.

Below the metrics, the app displays a table for each label in the dataset along with the top features contributing to the prediction for that label. Assuming the white-box classifier accurately matched the predictions of your trained model, the list of features tells you which words informed the model’s prediction.

An inherent limitation of this approach is that the white-box classifier uses a bag-of-words representation of the document, which doesn’t incorporate context the way most neural networks do. You can partially account for this by checking “Use position-dependent vectorizer”, which prevents grouping the same word together in the explanation, but you may still be unable to obtain an accurate explanation of a complex neural network model.

Other goodies

gobbli v0.1.0 includes several other improvements to the main library. Here’s what’s new:

  • Completely overhauled the benchmark framework, including a new Markdown output format which should be much easier to browse on GitHub
  • Benchmarks for embeddings, plotting the gobbli benchmark datasets using dimensionality-reduced embeddings for most models on the IMDB and 20 Newsgroups datasets.
  • Upgraded pytorch_transformers v1.0.0 to transformers v2.4.1, adding support for a host of new models
  • Implemented support for arbitrary scikit-learn models via SKLearnClassifier and TF-IDF as a baseline embedding approach via TfidfEmbedder
  • Implemented support for spaCy text categorizer models and spacy-transformers models via SpaCyModel
  • Several small improvements to existing models, including new weights based on PubMed data for BERT, autotuning support for fastText, gradient accumulation support for Transformer models, and new weights for USE
  • A couple of helpful functions for monitoring gobbli’s disk usage and cleaning up old task input/output

Check out our code on GitHub if you’re interested in any of the above. We hope gobbli makes your text analysis workflows a little more convenient!

--

--