Introducing Texar-PyTorch: An ML Library Integrating the Best of TensorFlow into PyTorch

8 min readOct 17, 2019

Crossposted on the Petuum blog.

We are excited to introduce Texar-PyTorch, an open-source general-purpose machine learning toolkit that supports a broad set of applications with a focus on natural language processing (NLP) and text generation tasks.

Stemming from its already-popular Texar TensorFlow equivalent, Texar-PyTorch integrates many of the best features from Tensorflow into PyTorch. The toolkit is highly customizable, exposing APIs at multiple abstraction levels to suit both novice and experienced users.

In particular, Texar-PyTorch replicates comprehensive useful TensorFlow (TF) modules to significantly enhance PyTorch existing functionalities, including:

Data: Best practice of tf.data for easy data processing, batching, and iteration, all efficient based on buffered shuffling, caching, and lazy-loading. We also replicate TFRecord to ingest arbitrary complex data types and large files.
Modeling: Abundant functions and excellent modularization of ML models, such as the principled design of sequence models including text generation decoders, attention mechanisms, and RNNs, etc.
Training: We replicate high-level APIs of TF Estimator and keras.Model but with much greater flexibility, for turnkey model training, evaluation, prediction, TensorBoard visualization, and seamless combination with external hyperparameter tuning tools.

What Texar-PyTorch Provides

With the best TF features integrated into the intuitive PyTorch programming model, Texar-Pytorch provides comprehensive support for building ML applications:

State-of-the-Art Model Building Blocks — building an ML model is like assembling Lego bricks. Plugging-in and swapping-out modules as you like. Read more
Easy and Efficient Data Processing — rich built-in processors for common types of datasets. Simple-but-powerful interfaces for arbitrary custom Best practice integrated, no worry about efficiency. Read more
Turnkey and Flexible Model Training with Executors — Getting free of boilerplate code for training and evaluation loops, while still highly flexible to customize for your specialized need. Read more

Code Example 1 demonstrates the complete code of using Texar-PyTorch to build and train a state-of-the-art sequence-to-sequence model for, e.g., text summarization and machine translation.

**Code Example 1:** Building and training a conditional GPT-2 model (e.g., for text summarization) with Texar-PyTorch.

Why Choose Texar?

Supports both TensorFlow & PyTorch. Sometimes it’s not your choice of which underlying framework to use, and learning a new higher-level framework is probably just as time-consuming as writing the parts yourself. Now with Texar, you can use the same interfaces with minimal changes in both frameworks. The two versions can even share pre-trained model weights that you’ve downloaded.
Provides Natural Language Processing, All in One Kit. Texar has a comprehensive coverage of neural models on natural language processing tasks, especially text generation. Figure 1 gives a snapshot of Texar modules. With Texar, not only will you have access to a complete range of state-of-the-art pre-trained models, but you’ll also find all the utilities you need, from data processing to modeling to training and evaluation. We’ve got you covered.
Facilitates Novice- and Expert-Friendly. Whether you’ve just picked up deep learning, or you’re an experienced researcher, you’ll find Texar easy to use. Texar provides state-of-the-art built-in components but remains flexible enough for customizations.

**Figure 1:** Texar provides a comprehensive set of modules for data processing, model architectures, loss functions, training, evaluation, as well as a range of state-of-the-art pre-trained ML/NLP models (e.g., BERT, GPT-2, etc).

In the following, we provide more details of the three key parts with Texar-PyTorch, including modeling, data, and training.

Modeling

As shown in Figure 1, Texar-Pytorch offers a full set of ML modules. With the well-designed interfaces, users can freely build arbitrary models by assembling the building blocks.

The following example shows how flexible the module interfaces are to meet the needs of different learning algorithms, such as maximum-likelihood learning and adversarial learning. Moreover, Texar provides interfaces at multiple abstraction levels for users of different expertise. For example:

It’s straightforward to invoke a common inference method, e.g., teacher-forcing decoding, by simply setting the decoder argument `decoding_strategy=’train_greedy’`.
OTOH, to perform advanced inference, e.g., Gumbel softmax decoding for adversarial learning, users can use a GumbelSoftmaxHelper. Expert users can further define new Helpers to customize whatever decoding strategies.

**Code Example 2:** Building a pre-trained GPT-2 language model, fine-tuning with maximum-likelihood learning and adversarial learning (using BERT as the discriminator).

To summarize, modeling with Texar-PyTorch features the following key advantages:

Excellent modularization — switching between different learning contexts is enabled by simply plugging in/swapping out a couple of modules.
Multi-level interfaces — high-level intuitive interfaces for novice users and low-level highly-customizable ones for expert users.
Built-in state-of-the-art pre-trained models — BERT, GPT-2, RoBERTa, XLNet and more, for tasks of text encoding, classification, sequence tagging, and generation.

Data

Texar-Pytorch data modules are designed for easy, efficient, and customizabledata access for any ML and NLP tasks. Combining the best practices from TensorFlow tf.data, the modules greatly enhances the PyTorch native DataLoader by:

Decoupling single instance processing and batching — for clearer program logic and easier customization
Buffer-based shuffling, caching, and lazy-loading — for greater efficiency
Extensive dataset iterators — no extra user configuration needed
More intuitive APIs — no expertise needed to get the best practices in your project

Texar-PyTorch Built-in Datasets

For common types of datasets, Texar-Pytorch already includes ready-to-use modules, as shown in Figure 2 below.

In particular, RecordData is Texar’s equivalent to Tensorflow’s well-known TFRecordData, which reads files in binary format and thus allows arbitrary data types ranging from text to images. Cool, isn’t it! What’s more — The usage pattern is very similar to TFRecordData. The example below says it all.

Let’s say you want to train an image captioning model. Each data example would typically contain an image, a caption, and other meta info. Below is how you would do it in Texar-Pytorch.

**Code Example 3**: Loading complex image captioning data with Texar-Pytorch RecordData.

Creating Custom Datasets

Users can customize how to process and batch data instances, and Texar will take care of caching, lazy processing, iterating for you. The toy example below explains it.

**Code Example 4**: A customized dataset that performs BPE tokenization for input text.

Executor

Have you ever been bored by writing the training-evaluation loop, again and again, each time when starting a new project? Have you desired a single API to automate the loop, equipped with logging, checkpointing, visualization, and hyperparameter tuning? Do you even want the API to be flexible enough for your non-traditional algorithms, e.g., alternating multiple losses in adversarial learning? Texar Executor is here for you.

Executor is the PyTorch equivalent of the widely-used TF Estimator and tf.keras.Model, but is designed to be lightweight and much more customizable.

To demonstrate the power of Executor, we show an example of a hand-written train-eval loop v.s. Executor:

Let’s say we want the following functions in our project:

Print logs every `logging_steps` iteration to the console, a log file, and Tensorboard.
Perform validation every `validate_steps` iteration, by evaluating the model output with the BLEU metric.
If validation results improve, save the current checkpoint. If results failed to improve for `patience` consecutive trials, load the previous checkpoint, and scale the learning rate.

The steps above describe a pretty universal training loop. Here’s what a hand-written training loop would look like:

**Code Example 5:** A typical hand-written train-eval loop.

The code is very lengthy and tedious. Things can get even more troublesome when you need to add or change some functionalities. Now, what will the code look like, if we used Executors?

**Code Example 6:** The same train-eval loop with Executor.

And this is how Executor logs look in the command line:

Here you can observe that the validation BLEU is updated in-place, based on the previously predicted values. This is thanks to the Executor streaming metrics, which allows incremental computation of metric values. No need to wait until the end to see results on a large validation set!

As we can see, code with Executor is much more structured and readable. It is also much more extensible:

Q: What if we also want to do validation after each epoch?
A: Simply change `validate_every` to:

Q: What if we want to perform early stopping after we’ve scaled the learning rate `early_stop_patience` times?
A: Simply change `action_on_plateau` to:

Q: What if we also want to measure the word-level loss?
A: Simply add a new metric to `valid_metrics`:

Q: What if we want to do hyperparameter tuning and train the model multiple times?
A: Simply create an Executor for each set of hyperparameters that you want to test. Since Executor takes care of everything besides model creation, you don’t need to worry about consuming extra memory or accidentally retaining objects from previous runs. Here’s an example of using Executor with hyperopt.
Q: What if, at the end of each epoch, we want to upload the current checkpoint to the server, send an email containing the training progress, and take the dog out for a walk?
A: Weird, but okay. Simply register a custom action on a condition of your choice, and do whatever you wish:

Switching from Texar-TF to Texar-PyTorch

If you are a previous Texar-TF user, switching to Texar-PyTorch requires only minimal effort. Compared to Texar TensorFlow, Texar PyTorch has almost the same interfaces, making transitions between backends easy.

Although having similar interfaces, we also follow coding conventions for each framework, so you wouldn’t feel like learning a new sub-language. To this end, we changed some of the lower-level extensible interfaces to match the native design of respective frameworks more closely . Most of the changes lie in the data and executor modules, but as you’ve already seen, they’re still pretty easy to pick up.

Getting Started

To get started, visit our GitHub repository and follow the installation instructions. Useful resources include:

Documentation: We have detailed documentation for every module and function.
Examples: We strongly encourage you to check out our examples to get a basic idea of how Texar is used in practice. The examples are clearly documented and cover rich use cases.
ASYML Library: Find quick links to all Texar resources in one place.

*Petuum is a corporate sponsor of Texar. Petuum engineers are continuously contributing to the Texar code base and have been pivotal in this release.