Ludwig on PyTorch

Piero Molino
PyTorch
Published in
11 min readMay 24, 2022

How and why we ported Ludwig, the declarative deep learning framework, to PyTorch

Authors: Justin Zhao, Shreya Rajpal, Daniel Treiman, Jim Thompson, Travis Addair, Piero Molino

Overview

Ludwig is an open-source, declarative machine learning framework that makes it easy to define deep learning pipelines with a simple and flexible data-driven configuration system. It is suitable for a wide variety of AI tasks, and is hosted by the Linux Foundation AI & Data.

Ludwig allows users to define their deep learning pipeline by simply providing a configuration file, which lists the inputs and outputs, and their respective data types. Ludwig will then assemble and train a deep learning model and based on the configuration file, determine how inputs and outputs are preprocessed, encoded, decoded and which metrics and loss criterion to use.

Examples of Ludwig’s data-driven declarative configurations for different machine learning tasks through combinations of input and output data types.

Writing a configuration file for Ludwig is easy. The configuration file flexibility allows for full control of every aspect of the end-to-end pipeline. This includes exploring state-of-the-art model architectures, running a hyperparameter search, scaling up to larger than available memory datasets and multi-node clusters, and finally serving the best model in production. All of this is achieved through simple configuration file changes.

To learn more about how Ludwig works, check out the Ludwig Docs for our guide on how to get started, or read our publications on Ludwig, declarative ML, and Ludwig’s SoTA benchmarks.

We’re excited to release Ludwig v0.5, a complete overhaul of Ludwig from the ground up. In addition to new features and several important technical improvements, Ludwig v0.5 migrates our entire backend to PyTorch. This migration comes from a substantial 6 month undertaking involving 230+ commits, changes to 70k+ lines of code, and contributions from 40+ people. ​​PyTorch’s pythonic design and emphasis on developer experience are perfectly aligned with Ludwig’s principles of simplicity, modularity, and extensibility. With Ludwig on PyTorch, we’re thrilled to see what developers, researchers, and data scientists in the growing PyTorch community can bring to Ludwig.

In this post, we highlight how PyTorch users can make use of Ludwig. We’ll share results from benchmarking Ludwig v0.4 vs. Ludwig v0.5 on PyTorch. Finally, we’ll wrap up with what’s next for Ludwig.

Declarative Deep Learning, now in PyTorch

Ludwig v0.5 brings its declarative approach to structuring machine learning pipelines, as well as all of its models, tools, infrastructure, and contributors to the PyTorch ecosystem. Onboarding with Ludwig for new users has never been easier thanks to our revamped getting started guide, user guide, and developer documentation. We think Ludwig will be very useful for research scientists, data scientists, and machine learning engineers working in PyTorch.

For Research Scientists

PyTorch is the most popular library for deep learning research scientists who develop new training algorithms, design and develop new model architectures, and run experiments with them.

However, experimenting with a new architecture often requires a formidable amount of code for scalably loading and preprocessing data, and setting up pipelines for (distributed) training, evaluation, and hyperparameter optimization.

Minimal machine learning boilerplate

Ludwig takes care of the engineering complexity of deep learning out of the box, enabling research scientists to focus on building models at the highest level of abstraction.

Let’s say that you have a great idea for a novel architecture for image classification that changes how images are encoded. You would implement your new model as a PyTorch module.

However, this is rather incomplete — you’ll also need to figure out how to read images from disk with torchvision, write a training-checkpoint-eval loop, post-process logits tensors into predictions, and compute metrics over predictions. All these steps increase model development time and introduce potential sources of error.

Instead of implementing all of this from scratch, research scientists can implement new models as PyTorch Modules in Ludwig directly to take advantage of all of the engineering conveniences that Ludwig offers. Since your modeling idea applies specifically to image encoding, then only the encoder needs to be implemented.

The new encoder my_encoder can immediately be used in a new Ludwig configuration by just setting encoder: my_encoder. Ludwig will take care of the rest of the pipeline for you.

Comparing with baseline models

When you create a new model, you want to compare it with a baseline. With Ludwig you can create two almost identical configurations, one for the baseline and one for your model, that differ only in the encoding section. For instance you can train a ResNet baseline with the following config and command.

ludwig experiment --config baseline.yaml --dataset my_dataset.csv

Changing just the encoder to my_encoder and its parameters will train a model using your custom encoder.

ludwig experiment --config my_encoder.yaml --dataset my_dataset.csv

This guarantees the same preprocessing, training and evaluation is performed in both cases, to easily and fairly assess the performance of your new encoder.

Hyperparameter optimization with Ray Tune

Ludwig configurations can also include an hyperparameter optimization section, that allows you to declare the hyperparameters to optimize, their ranges, and the metric to optimize for, using RayTune, a Python library for experiment execution and hyperparameter tuning.

The hyperparameter optimization process can be run locally or on a Ray cluster, and any of the search algorithms RayTune supports can be chosen, including Bayesian optimization, Hyperband, Nevergrad and others.

Easy testing on multiple tasks and datasets

Registered models can be subsequently applied across the extensive set of tasks and datasets that Ludwig supports or on new ones. Ludwig includes a full benchmarking toolkit for running experiments with multiple models across multiple datasets with just a simple configuration.

For more information on how to add your dataset or model to Ludwig, check out the Ludwig Docs and the Ludwig Dataset Zoo.

For Data Scientists

Low-code interface for state-of-the-art models, including pre-trained Huggingface Transformers

Ludwig strives to bring state of the art performance for many ML tasks without needing to write hundreds of lines of code.

Ludwig provides robust implementations of common architectures including CNNs, RNNs, Transformers, TabNet, and MLP-Mixer. In addition, with our codebase now in PyTorch, Ludwig can more closely integrate with community projects such as torchtext, torchvision and torchaudio to support additional architectures and modealities.

Models can be trained from scratch, but Ludwig also natively integrates with pre-trained models, such as the ones available in Huggingface Transformers. Users can choose from a vast collection of state-of-the-art pre-trained PyTorch models to use without needing to write any code at all. For example, training a BERT-based sentiment analysis model with Ludwig is as simple as:

ludwig train --dataset sst5 --config_str "{input_features: [{name: sentence, type: text, encoder: bert}], output_features: [{name: label, type: category}]}"

Low-code Interface for AutoML

Ludwig AutoML allows users to obtain trained models by providing just a dataset, the target column, and a time budget.

Ludwig AutoML is still a preview feature, but it has been fully migrated to PyTorch in v0.5. To learn more, check out our blog posts describing its development, evaluation, and use for tabular datasets and text classification.

Highly Configurable Data Preprocessing, Modeling, and Metrics

Any and all aspects of the model architecture, training loop, hyperparameter search, and backend infrastructure can be modified as additional fields in the declarative configuration to customize the pipeline to meet your requirements. Here is an example of a configuration with many additional configuration parameters specified.

For details on what can be configured, check out Ludwig Configuration docs. If a model, loss, evaluation metric, preprocessing function or other parts of the pipeline are not already available, the modularity of the underlying architecture allows users to very easily extend Ludwig’s capabilities by implementing simple abstract interfaces, as described in the Developer Guide.

For Machine Learning Engineers

Effortless End-to-End Scale to Multi-Node, Multi-GPU

PyTorch provides great performance for training with one or multiple GPUs. However, there remains a great deal of operational complexity when building an end-to-end system for distributed training. Specifically:

  • Raw input data must be preprocessed and transformed into a format suitable for training. For large datasets, this means setting up a batch processing cluster like Spark or Dask to handle this step.
  • Processed data needs to be efficiently shuffled each epoch to ensure model robustness. Often this work is pushed onto the GPU training workers, resulting in imperfect local shuffling that bottlenecks the training process.
  • GPU training workers need to be provisioned and set up to coordinate with each other using a collective communication library like MPI or NCCL. This means data ingest pipelines, metrics computation, model weight initialization, and backpropagation steps all need to be rewritten to support data parallelism.
  • All of this distributed training infrastructure needs to be provisioned and configured to run as a workflow, commonly using an additional system like Airflow or Kubeflow to act as the orchestration layer.

In Ludwig v0.5, all of this is abstracted away from you as an implementation detail. By running Ludwig on top of Ray, the same Ludwig command line and Python API calls that run on your local laptop can scale across a cluster of machines in the cloud with zero code changes. All you need to do is start a ray cluster and submit your existing Ludwig command or script to run using the Ray CLI:

ray up cluster.yaml

ray submit cluster.yaml ludwig train --config model.yaml --dataset s3://bucket/dataset.parqut

When running on Ray, Ludwig handles the entire end-to-end orchestration and distributed execution automatically. Dask on Ray will be used to scale preprocessing to arbitrarily large datasets. Preprocessed data can be optionally cached in a remote object storage system like Amazon S3 or Google GCS as a partitioned Parquet dataset. This cached dataset can be reused across multiple training runs when using the same input and output features but different training configurations.

At training time, a GPU will always be used if it’s available by default. If your Ray cluster contains multiple nodes, Ludwig will automatically scale the training to as many GPUs as are available in the cluster using Horovod on Ray. Ludwig on Ray also makes use of the newly released Ray Datasets API to efficiently overlap the data ingest pipeline (including full per-epoch shuffling) with training by distributing the data shuffling and batching across the non-GPU nodes in the cluster.

Finally, if you are running using an auto-scaling or multi-tenant cluster, you can request the exact number of workers / GPUs to use during distributed training and Ludwig + Ray will automatically scale up to the requested number of resources on your behalf:

While all of Ludwig on Ray can be run without additional configuration by using reasonable defaults, the declarative structure provides full control for users who want to further optimize the ML infrastructure for training. See the Ludwig backend configuration docs for complete details.

Easy Productionisation

Bringing machine learning models to production is usually a lengthy and complicated process. Ludwig provides several options to make deployment straightforward.

With Ludwig Serving, Ludwig makes it easy to serve deep learning models, including on GPUs.

Use ludwig serve to launch a REST API for your trained Ludwig model.

ludwig serve --model_path /path/to/model

curl http://0.0.0.0:8000/predict -X POST -F 'review=the movie was awesome'

For highly efficient deployments it’s often critical to minimize the overhead caused by the Python runtime. Ludwig supports exporting models to efficient Torschscript bundles.

ludwig export_torchscript --model_path /path/to/model

Comparing Performance of Ludwig v0.4 and v0.5

Switching to use PyTorch as Ludwig’s backend of choice was strongly motivated by the increase in productivity in development, debugging and iteration that the more pythonic PyTorch API affords us and the great ecosystem the PyTorch community has built around it.

At the same time we wanted to make sure that the experience of Ludwig users would keep being delightful. We’ve run extensive comparisons between Ludwig v0.5 (PyTorch-based) and Ludwig v0.4 on text, image, and tabular datasets for evaluating training speed, inference throughput and model performance, to make sure they did not degrade. Our results reveal roughly the same high GPU utilization (~90%), but significant improvements in distributed training speed and memory usage without impacting model accuracy nor time to convergence. We show here only the results we obtained on the DBpedia dataset for brevity, but the same trends apply across all our tests.

The number of examples per second training a model on the DBPedia dataset (text classification) on a single machine and using distributed training across multiple workers. While the training speed is similar between v0.4 and v0.5 on a single worker, as we add more workers, training speed increases significantly in PyTorch.

Using Ludwig 0.5 on PyTorch, we observed very fast time per epoch and total time to model convergence. These differences become more pronounced as we add more workers in a distributed training setting using T4 GPUs.

GPU memory usage with different batch sizes on the on DBPedia dataset for a single machine. Memory usage in v0.5 is consistently lower than in v0.4.

We have also verified that porting to PyTorch retains both model accuracy and GPU utilization in both single machine and distributed training settings.

What’s Next?

0.5 is our most exciting release yet! With the major migration of Ludwig to PyTorch complete, we will switch gears and start adding new features requested by the community. To do so we plan to publish smaller releases more frequently in the future.

We have a ton of cool functionality in the works including AutoML for more tasks, iterative AutoML, self-supervised learning, a hub for Ludwig models, more architectures and pre-trained models, and preprocessing efficiency improvements.

If you are interested in contributing, have questions, comments, or thoughts to share, or if you just want to be in the know, please consider joining the Ludwig Slack and follow us on Twitter!

We’re thrilled to be joining the PyTorch ecosystem and we can’t wait to see how our active community of engineers, researchers, and data scientists will take Ludwig to new heights.

We’re also working on Predibase, a fully managed enterprise platform built on top of Ludwig, Horovod and Ray, to bring cutting-edge declarative machine learning to more organizations. We recently announced it and we are looking forward to your feedback. Sign up for early access here.

Acknowledgements

A lot of work went into Ludwig v0.5, and we want to thank everyone who contributed and helped, and in particular the main contributors and community members to this release: Anne Holler, Avanika Narayan, Saikat Kanjilal. Special thanks to the immense support from the Stanford’s Hazy research group led by Prof. Chris Ré, to Richard Liaw, Clark Zinzow, Hao Zhang, and Micheal Chau from the Ray team, Matthias Reso, Less Wright and Helen Suk from the PyTorch team and the LF AI & Data staff.

Resources

--

--