Github for Machine Learning Models and Datasets

WebOccult Technologies

Published in

WebOccult Technologies Pvt Ltd

5 min readMay 17, 2022

A platform for Model and Dataset Management and Versioning

Oh god, managing models locally and storing the model performance somewhere else is really messy.

Ever came across such a situation? If yes, You’ve come to the right place, we will discuss here how to store and keep track of your model performances.

When Edison created a bulb after hundreds of failures, he said that he also discovered 10,00 ways that didn’t create bulbs. Similarly, when we find the perfect model that results in good accuracy, doesn’t mean the job is done, we still need to store other models as well which didn’t give a good performance. All these things are important to store in order to avoid mistakes in the future.

For such reasons, model versioning becomes very important to remember the performance of the model best for your use case.

Let’s discuss an amazing tool that allows storing models and their useful parameters with versioning. If you haven’t guessed it yet.
It’s WANDB - weights and biases.

What is WANDB?

A multi-tool platform used for model and dataset versioning which allows you to keep the track of parameters you used to get there.

At every iteration, changes in hyperparameters change the effect of the model on the test set. Some predictions are closer to expectations and some are further. To see which model gave the best results, the meta-data of all the models is important to store. After comparing the hyperparameters and test results the best model is defined for further use. So let’s get a brief idea about wandb artifacts.

WANDB Artifacts: Data & Model Versioning tool

Artifact here actually means the dataset and model. Models and the dataset is stored in the AWS and the wandb server stores its location in the AWS server. Using the parameters and hyperparameters as raw data, wandb preprocesses it to create an analytical description of the experiments performed. You can go through wandb to see Artifacts Walkthrough

Why is WANDB a celebrity tool for developers?

The answer to that is very simple. It is because of its functionalities, framework integration, and environment support.

So let’s go through the key features that wandb allows us to work on.

Tracking model parameters
wandb allows storing all the model parameters to generate analytical charts from it.

Keeping track of hyperparameters
It creates hyperparameter tuning with wandb sweeps.

Create artifacts collaboratively Collaborative Reports
Currently, 2 artifacts create modes are available
Single for a single user and Collaborative for multiple users or team
Patch mode will be soon added where you can also edit the artifact and create a new version automatically
Keep track of training runs

System Usage

Getting alerts
You can get alerts for your custom triggers and crash in the system.
Replicate historic results
Wandb stores the parameters and results of training, so if you need a model with the same accuracy, referring to the hyperparameters you can replicate the result even after years, whether you remember it or not
Store experimental results
Wandb stores all results whenever you upload data or model. It is the most useful feature of wandb
Compare artifacts
Wandb allows you to compare models up to the depth of its architecture.
3D Visualization

Run parameter sweeps
Parameter sweeps allow for parallel hyperparameter tunning
Compare the best accuracy
After gathering all the models and their accuracy, it allows to find and compare models for the best accuracy
Get multiple matrices in one chart
You can compare accuracy metrics of models in different chart formats like bar charts, line plots, etc.
Stepwise and Incremental logging
After the change in each parameter and each small change made in the model, logs are added in an incremental manner
Summary matrices
To track important values that summarize the model performance, matrices are added where wandb automatically updates the matrices based on the last values achieved
Customize Summary
Summary of the data or models can be customized based on various factors you want. Usually, a summary is defined using mean, mode, min, and max but you can also summarize the behavior of the model by adding the minimum loss matrix and maximum accuracy matrix

Whereas Core Features of Model Versioning is:

→ Upload model
→ Version model
→ Give alias name
→ Compare model
→ Download model

You can also refer to all these features in-depth at wandb model versioning documentation. There is also a detailed explanation of managing a model ecosystem and visualizing and sharing your workflows.

A Major Concern solved by wandb

In wandb, there is data security, as every artifact is stored in the AWS, wandb stores the links of the data which is accessible to only users that are in the project.

Like anything and everything, a lack of management can lead to the loss of important things. WANDB is a great manager for handling tons of data and several models of the same use case with differences among them. Around 70k ML practitioners have adapted wandb, so are you ready to say goodbyes to the worries regarding data and model versioning and management?

Credits:
A whole-heartedly thanks to our team and especially Shailja for contributing to this article.