Github for Machine Learning Models and Datasets

WebOccult Technologies
WebOccult Technologies Pvt Ltd
5 min readMay 17, 2022

A platform for Model and Dataset Management and Versioning

Oh god, managing models locally and storing the model performance somewhere else is really messy.

Ever came across such a situation? If yes, You’ve come to the right place, we will discuss here how to store and keep track of your model performances.

When Edison created a bulb after hundreds of failures, he said that he also discovered 10,00 ways that didn’t create bulbs. Similarly, when we find the perfect model that results in good accuracy, doesn’t mean the job is done, we still need to store other models as well which didn’t give a good performance. All these things are important to store in order to avoid mistakes in the future.

For such reasons, model versioning becomes very important to remember the performance of the model best for your use case.

Let’s discuss an amazing tool that allows storing models and their useful parameters with versioning. If you haven’t guessed it yet.
It’s WANDB - weights and biases.

What is WANDB?

A multi-tool platform used for model and dataset versioning which allows you to keep the track of parameters you used to get there.

At every iteration, changes in hyperparameters change the effect of the model on the test set. Some predictions are closer to expectations and some are further. To see which model gave the best results, the meta-data of all the models is important to store. After comparing the hyperparameters and test results the best model is defined for further use. So let’s get a brief idea about wandb artifacts.

WANDB Artifacts: Data & Model Versioning tool

Artifact here actually means the dataset and model. Models and the dataset is stored in the AWS and the wandb server stores its location in the AWS server. Using the parameters and hyperparameters as raw data, wandb preprocesses it to create an analytical description of the experiments performed. You can go through wandb to see Artifacts Walkthrough

Why is WANDB a celebrity tool for developers?

The answer to that is very simple. It is because of its functionalities, framework integration, and environment support.

https://docs.wandb.ai/

So let’s go through the key features that wandb allows us to work on.

  • Tracking model parameters
    wandb allows storing all the model parameters to generate analytical charts from it.
  • Keeping track of hyperparameters
    It creates hyperparameter tuning with wandb sweeps.
  • Create artifacts collaboratively Collaborative Reports
    Currently, 2 artifacts create modes are available
    Single for a single user and Collaborative for multiple users or team
    Patch mode will be soon added where you can also edit the artifact and create a new version automatically
  • Keep track of training runs
  • System Usage
  • Getting alerts
    You can get alerts for your custom triggers and crash in the system.
  • Replicate historic results
    Wandb stores the parameters and results of training, so if you need a model with the same accuracy, referring to the hyperparameters you can replicate the result even after years, whether you remember it or not
  • Store experimental results
    Wandb stores all results whenever you upload data or model. It is the most useful feature of wandb
  • Compare artifacts
    Wandb allows you to compare models up to the depth of its architecture.
  • 3D Visualization
  • Run parameter sweeps
    Parameter sweeps allow for parallel hyperparameter tunning
  • Compare the best accuracy
    After gathering all the models and their accuracy, it allows to find and compare models for the best accuracy
  • Get multiple matrices in one chart
    You can compare accuracy metrics of models in different chart formats like bar charts, line plots, etc.
  • Stepwise and Incremental logging
    After the change in each parameter and each small change made in the model, logs are added in an incremental manner
  • Summary matrices
    To track important values that summarize the model performance, matrices are added where wandb automatically updates the matrices based on the last values achieved
  • Customize Summary
    Summary of the data or models can be customized based on various factors you want. Usually, a summary is defined using mean, mode, min, and max but you can also summarize the behavior of the model by adding the minimum loss matrix and maximum accuracy matrix

Whereas Core Features of Model Versioning is:

→ Upload model
→ Version model
→ Give alias name
→ Compare model
→ Download model

You can also refer to all these features in-depth at wandb model versioning documentation. There is also a detailed explanation of managing a model ecosystem and visualizing and sharing your workflows.

A Major Concern solved by wandb

In wandb, there is data security, as every artifact is stored in the AWS, wandb stores the links of the data which is accessible to only users that are in the project.

Like anything and everything, a lack of management can lead to the loss of important things. WANDB is a great manager for handling tons of data and several models of the same use case with differences among them. Around 70k ML practitioners have adapted wandb, so are you ready to say goodbyes to the worries regarding data and model versioning and management?

Credits:
A whole-heartedly thanks to our team and especially Shailja for contributing to this article.

--

--

WebOccult Technologies
WebOccult Technologies Pvt Ltd

WebOccult is an acclaimed technology company with a team of passionate and forward-thinking IT professionals.