Announcing Artifacts

6 min readJun 10, 2020

Our newest tool, W&B Artifacts, is publicly available. With Artifacts, you can store and version your datasets, models, and results. Get started by checking out our docs.

The very first version of Weights & Biases did just one thing: save your trained model weights to an account in the cloud. You’d run “wandb push model.h5” after training, the file would transfer, and that was it.

We quickly realized we could make our tool more useful by keeping track of the process that generated the model, and so Weights & Biases experiment tracking was born. Our tool today is all about keeping track of the model training process: we save output logs, code versions, configuration, hyper-parameters, and metrics.

But tracking processes (or in W&B parlance “runs”), is only half the story. Almost any process worth tracking takes some data as input (like a dataset of images and their classes) and produces some data as output (like a blob of trained model weights). That data may then in turn by used by yet another process.

To understand how a given model was built, you need to know exactly what data it was trained on. Much more time is spent grooming, improving, and experimenting with data than is spent on tweaking hyper-parameters.

So we’re going back to our roots! W&B Artifacts gives you a first class way to store and track the data that your processes use and produce.

Welcome Artifacts!

Artifacts is available in version 0.9.0 of our wandb Python library. You can use it to store and version your datasets, models, and results. We automatically keep track of all the relationships between artifacts & runs in your projects, so you can dig in and understand exactly how a model was produced.

Tracking datasets, training, models, evaluation, and results using Artifacts. Can you spot two stages of fine-tuning?

Artifacts is designed to store and move large amounts of data. Artifact files are deduplicated, so if you create a new version of an 80GB dataset that differs from a previous version by a single image, we’ll only sync the delta. If you don’t want to store the actual data, you can use Artifacts to store references to files in your own cloud storage buckets.

Other data versioning tools out there seem to be made by devops engineers for devops engineers. We’ve made Artifacts for ML practitioners first and foremost: with intuitive APIs and no up-front specifications, infrastructure changes, or planning required.

Use cases

You can use Artifacts to store and version all of your datasets. Most larger teams we work with have an in-house dataset registry: a place where ML engineers and researchers register datasets they’ve created for discovery by everyone else. With Artifacts, you get a central dashboard of all your datasets, where you can browse their contents and write descriptive notes. You can easily pull down a dataset for training using our API, and browse lists of all the runs that have used each dataset version.

Artifacts are great for storing trained models. You can determine exactly how your models were produced, and add metadata to denote that a model has been deployed to production. We keep a complete log of all updates to artifact metadata, so you can see how a model’s usage has evolved over time.

I’m especially excited about the dynamic nature of Artifacts. You don’t need to define a pipeline specification up front, just log and use artifacts from runs, and we’ll stitch everything together. This means we can track things like multiple stages of fine-tuning.

Tracking evolving pipelines

It’s typical today to either train models from scratch, or fine tune an off-the-shelf model once on a custom dataset. But why start over so often? A child who goes to school doesn’t need to relearn everything they’d learned in prior days. Large models can be very expensive to train, so finding ways to iteratively reuse trained models after making model or dataset modifications seems like a promising future path.

Our friends at OpenAI have pioneered a technique they call “model surgery” wherein they do exactly that. Here’s a quote from their Dota 2 paper: “As a naive comparison, if we had trained from scratch after each of our twenty major surgeries, the project would have taken 40 months instead of 10 (in practice we likely would have made fewer changes)”.

This type of iterative training process is notoriously hard to keep track of. And it’s not usually a linear progression. At each stage you try many different things, before proceeding down one path. You may backtrack to a previous model version if you encounter a dead-end. The final result in the Dota 2 case is a giant tree of experiments 20 layers deep.

Artifacts makes keeping track of all of this simple. My hope is that this tool helps push the field forward.

Backstory

This tool has been a long time in the making. In May 2019 I wrote a two-page proposal called “Datasets, versioning, and pipeline tracking” and shared it with the team. Annirudh, one our amazing engineers, built a prototype on our next offsite that August. [Aside: A lot of the things you see in W&B were born on our offsites, like Reports and Sweeps!]

We were all blown away by Anni’s demo. Then we went back to the real world and needed to continue scaling and improving the core experiment tracking experience, so we put the project on hold.

In January I finally got some time to bring the branches back to life and start the real work of designing Artifacts for production. Since then, Anni and I have been hard at work to get this out, with lots of contributions and support from the rest of the team. There have been countless documents written, commits committed, conversations with customers, tests run, and demos given.

This is the just beginning of this new tool. I believe if you want to build great tools, your most important skill is your ability to listen. And I’m proud to say we’ve built an excellent culture of listening at W&B. Everyone on the team is excited to join customer conversations, talk directly to users, and hear feedback good and bad.

Over the next few months we’ll be doing a lot of that for Artifacts. We’d love to hear from you. Our goal is make versioning all of your data so natural that you don’t need to think about it.