Deep learning with text data doesn’t have to be scary
If you’re doing applied natural language processing in 2019, you’ve probably heard about exciting deep learning models like BERT. Maybe you’ve even checked out their source code in an attempt to apply it to your own problems.
To the rest of us, even with some background in traditional machine learning, it looks absolutely terrifying.
We just want to find out whether the newest model works better than a simpler approach on our dataset, but the docs are gunked up with CoLA and GLUE and a thousand hyperparameters to tweak. We might power through, spending several hours mangling data into arcane schemas and managing an intricate web of config files and checkpoints. Or we might just give up, yearning for the day when BERT makes it into scikit-learn.
Conceptually, using a deep learning model for classification isn’t much different from good ol’ logistic regression. We have some data and some labels; we want to train on some of the data and evaluate performance on the rest. So why does it have to be so scary?
gobbli aims to provide an intuitive interface to cutting-edge deep learning models for text.
gobbli is a Python library which wraps several modern deep learning models in a uniform interface that makes it easy to evaluate feasibility and conduct analyses. It leverages the abstractive powers of Docker to hide nearly all dependency management and functional differences between models from the user.
pip install gobbli # If you'll be doing data augmentation and/or document windowing # (read on for details), you may need the following optional # requirements as well
pip install gobbli[augment,tokenize]
The simplest way to use gobbli is through Experiments. An Experiment is a canned workflow encompassing model training and evaluation. gobbli currently only implements the ClassificationExperiment, which accepts a classification dataset. It performs a train/validation/test split, trains the model using the train/validation sets, and evaluates on the test set.
All you need to provide is a list of texts and a list of labels (hopefully more interesting than the example below). Here’s an extremely basic experiment using BERT:
The results object contains some pre-baked metrics (including accuracy and weighted precision/recall/F1 score). It has some evaluation tools, including an error report showing examples of top false positives/negatives:
…and a predicted probability plot, which shows the predicted probability of every observation in the test set for each class:
Finally, it also gives you the raw model predictions on the test dataset, in case your evaluation needs are more custom.
If your dataset is small, you may benefit from data augmentation. gobbli implements a few augmentation strategies, which generate synthetic documents from your real documents by replacing a proportion of words:
- Word2Vec: replacing similar words based on Word2Vec similarity
- WordNet: replacing words with synonyms, hyponyms, and hypernyms based on the WordNet ontology
- BERTMaskedLM: masking words and replacing them with predictions from BERT’s masked language modeling head
Augmenting your dataset using any of these strategies is easy:
Just make sure your evaluation dataset doesn’t include any generated data. You can accomplish this with an Experiment using the
Another inconvenience you may run into with modern deep learning for text is that most models have a fixed max sequence length and truncate longer documents. If your texts are long, your model may not perform well when crucial information is hidden later in the documents.
gobbli provides helpers for “document windowing”, which converts each document into one or more equally-sized windows. The windows can then transparently be used for any task. Task output can be pooled where appropriate, so you can easily obtain (for example) the average prediction on all windows for each document rather than just the first window. See the docs for more info.
For more granular control over your workflow, you can use tasks. A task represents a single interaction with a model, such as training, generating predictions, or generating embeddings. Experiments are designed to wrap a commonly-used sequence of tasks. Each model implements support for one or more tasks via mixins; for example, BERT implements training, prediction, and embedding generation, while the Universal Sentence Embedding model only implements embedding generation.
The first type of task you might use is training:
The output includes some metrics and a trained model checkpoint, which can be applied for prediction and/or embedding generation.
A trained model can then be used for a prediction task:
The output contains predicted probabilities and classes for the input from the trained model.
Another task that might be performed by a trained or untrained model is embedding generation:
Output includes an embedding for each input document. Depending on the pooling method used, it can also include per-token embeddings.
For details on the tasks supported by each model, see the docs.
Since deep learning models take a long time to train, gobbli also provides simple baseline models to facilitate scaffolding your code. See the docs for MajorityClassifier and RandomEmbedder for more info.
gobbli provides some additional features worth mentioning briefly:
- GPU support for models and experiments
- Generated artifacts are organized under your home directory by default but can be customized per-model and per-task
- Experiments support parameter tuning and report final test performance on the parameter combination with the best validation metrics
- Experiments can be run in parallel on a single node or distributed using ray
- Tasks and models output structured metadata, which can be used for custom bookkeeping and re-initializing older models
- gobbli itself can be run inside Docker, if you’re brave
See the advanced usage docs for more info.
We built gobbli to assist in experimentation and evaluation of deep learning models for text classification. It’s proven helpful in situations where we’re trying to figure out what model meets our problem’s constraints and then making predictions in batch. We would hesitate to use gobbli in a production setting for real-time predictions. The Docker-based approach comes with a lot of overhead in terms of disk usage and latency, and the uniform interface makes it difficult to expose full customization for each model. If you need an optimized, perfectly-tuned model to operationalize, you should look elsewhere. If you’re trying things out or running an analysis, gobbli can help.
Source code is hosted on Github if you want to dive in. We hope gobbli makes deep learning a little less intimidating for you!