Why do you need to use W&B to track your ML project?

Published in

BEYOND DATA by LittleBigCode

7 min readMar 6, 2023

By Jamila REJEB, Data Scientist at LittleBigCode 🚀

If you are reading this article, then you are currently working on a machine learning project and are eager to learn about ways to help you develop and deliver better models. As a matter of fact, during a lifecycle of a Machine Learning project, creating an ML model is merely the first step, deploying and monitoring your ML models, data and experiments is where things get complicated. Therefore, we need a methodical approach and set of practices and tools to address this challenge.

That is why, at LittleBigCode, we have decided to help you tackle this issue through a series of articles.

In this article, we will dive into a particular tool designed to support and automate key steps in the MLOps life cycle, such as experiment tracking, dataset versioning and model management.

Without further ado, let us discover what Weights & Biases (W&B) has to offer.

What is Weights and Biases?

Weights & Biases (W&B) is introduced as « the developer-first MLOps platform». It is designed to be a platform geared towards developers for building better machine learning models more efficiently.

Weights and biases is a web-based subscription service. You start by creating a free account and with it, you get 100 GB of data and artifacts storage.

Using this cloud-based service you will be able to host your experiments in a single central repository and if you have a private infrastructure, Weights & Biases can also be deployed on it.

There are two main components that make up W&B, the Workspace, and a Python API.

a. the python API component is what you use to integrate your ML code with W&B and get insights from your experiments in the Workspace

b. the workspace on the other hand contains the dashboard and the navigation bar where you can access recent Projects and get a visual understanding of your datasets and experiments

Experiments: Leightweight experiment tracking
Reports: collaborative dashboards
Artifacts: Dataset and model versioning
Tables: Interactive data visualization
Sweeps: Hyperparameter optimization

Data visualization with Tables

Every ML projects starts by understanding the data in order to build interesting features.

The Tables feature helps sort, filter, group and create charts directly from tabular data.

You can also use this functionality to understand and visualize your machine learning model predictions. Like in this example.

You can group by the prediction and see which examples are being misclassified by grouping by the guess column as shown below.

Tracking Experiments using the dashboard

Every Machine Learning project starts with an experimental step in which different models are tested, features, and hyperparameters. However, too often we find ourselves lost in all the folders, excel files, and notebooks that we used to track and compare these experiments’ performances
Using W&B dashboard you will be able to compare your experiments using the graphs created from all the metrics that you have logged.

More practically, you will start by adding 5 lines of code to your existing python.

# Flexible integration for any Python script
import wandb

# 1. Start a W&B run
wandb.init(project=wandb_example)

# 2. Save model inputs and hyperparameters
config = wandb.config

config.learning_rate = 0.01

# Model training here
.....

# 3. Log metrics over time to visualize performance
wandb.log({"loss": loss})

And just like that, you have created a project named “wandb_example”

Model optimization with sweeps

An important step in choosing the best model for a specific task is to optimize the hyperparameters. This search however tends to be heavy, and time-consuming. In addition, we generally end up again with a lot of graphs and saved models with complex names.

W&B has a specific feature for this task. On the left of your project space, you will find an icon that looks like a broom: this is where you create your sweep files. Once your sweep file is configured, the search will be run and then you will be able to see its result directly on the dashboard. You can even visualize which hyperparameters affect the metrics you care about.

Versioning with Artifacts

Reproducibility is very important in any project, especially for those involving Machine Learning models. You have to save your model at each training and also your dataset. By building a dependency graph you will be able to trace the flow of data through your pipeline, so you know exactly which datasets feed into your models thanks to the graph view. In the example below, you can see that for the same project we have two versions of the dependency graph. In each version we have used a different version of the dataset with a different training script. This feature in this case will help us better visualize our pipeline.

Collaborative analysis using Reports

Another useful feature is the reports. By creating a report you can easily share updates and outcomes of your machine learning projects with your coworkers, add text to explain how your model works, show graphs and compare model versions and demonstrate progress towards milestones. Your coworkers will also be able to edit and comment on the report.

Weights and Biases vs MLFlow

Since we have already discussed how MLFLOW helps us with our machine learning project in a previous article: MLOps: Why data and model experiment tracking is important? How tools like DVC and MLflow can solve this challenge? We have decided to compare it with what W&B offers.

In order to choose the best tool for your project, you will have to answer a few questions :

What is your budget?
If you’re working on a low budget, MLflow is a better option because it is free (open-source) for experimental tracking.
In which language(s) is your project?
MLflow is language-agnostic i.e it can be used with any machine learning library in Python or R. While Weights & Biases only works for Python scripts.
Do you need the solution to be hosted immediately and ready to use?
Weights & Biases offers both hosted and on-premises setup, while MLflow is only available as an open-source solution that requires you to maintain it on your server.
How will you deploy your model?
MLflow offers end-to-end ML lifecycle management, while Weights & Biases only offers features like experiment tracking, model management, and data versioning.
How will you comment on your experiments and collaborate with your team to share insights?
While W&B offers the possibility to easily visualize your datasets and create collaborative reports, MLFlow is limited to tracking experiments and comparing models based on the logged metrics.

In a nutshell, answering these questions will help you pick the tool that is best suited for your project needs.

Conclusion

After exploring every feature of the W&B platform and testing it on a dummy project, we can now create a small review to summarize the main advantages and drawbacks of this tool.

Let us start with the Pros.

Ease of use: creating an account was very straightforward. The platform is neat and very easy to play with. The free 100 Gb is a good start to test the product.
A central, user-friendly, and interactive dashboard where you can view your experimentations and track their performance.
Tracking every part of the model training process, visualizing models, and comparing experiments.
Automated hyperparameter tuning with the use of Sweeps, which provides a sample of hyperparameter combinations to help with model performance and understanding.
Collaborative reports for teams, where you can add visualizations, organize, explain and share your model performance, model versions, and progress.
End-to-end artifact tracking of the machine learning pipeline, from data preparation to model deployment.
Easy integration with frameworks like Tensorflow, Pytorch, Keras, Hugging Face, and more.
Collaborative work in a team with multiple features for sharing, experimenting, etc.

These are all useful features that Weights & Biases provides, which makes it a good tool for research teams looking to discover, learn and gain insights into machine learning experiments.

However, there are still a couple of cons that we deemed worthy of listing:

ML lifecycle management: Managing the complete lifecycle of a model is important during research, i.e from data sourcing to model deployment because it allows them to correctly monitor, and debug any issue at any stage of development.
Production use-case: For production-based teams or projects, Weights & Biases is not a good option because of its lack of a production engine that will help you deploy your model and generate new predictions as you receive them.
Model inference: An important part of research is testing and carrying out real-time inferences. This is why model deployment is needed right after building and evaluating models.

In conclusion, Weights & Biases provides multiple useful features, which makes it a good tool for research teams looking to discover, learn and gain insights into machine learning experiments. However, when it comes to delivery, Weights & Biases is not always the best option.

Consult all the articles of LittleBigCode by clicking here: https://medium.com/hub-by-littlebigcode

Follow us on Linkedin & Youtube + https://LittleBigCode.fr/en