MLOps at Edge Analytics | Model Tracking

Part Four of Five

Published in

Edge Analytics

7 min readMay 3, 2023

As machine learning models become more widely deployed, ML practitioners have shown increasing interest in MLOps. In our introductory blog, we give a brief background on how we think about MLOps at Edge Analytics.

Here we look at the fourth pillar of our MLOps pipeline: model tracking. Tracking generally means logging relevant information to a database or other storage for future use. As we train and test ML models, integrating model tracking software allows us to easily compare and contrast our model runs to make high value insights quickly and with confidence. This post is part four of a five part series.

You can find the other blogs in the series by following the links below:

Part 1: Data Storage with AWS S3 and Boto3
Part 2: Data Processing Pipeline
Part 3: Model Development
Part 4: Model Tracking
Part 5: Model Deployment

ML modeling experiments are most useful if we keep a record of what we tried and how it turned out. Tracking model configurations and results per experiment trial helps us:

Efficiently identify parameters of the best performing models
Reproduce these models at a later date

The second point is particularly important for us to be confident that any models we develop will behave as expected when deployed “in the wild.”

Two questions help guide our model tracking:

What components do we want to track?
What data formats and services should we use to track them?

There are many considerations to take into account when answering these questions. For example, certain components (e.g. metrics) help identify best performing models. Other components (e.g. hyperparameters) are necessary for reproducibility. We should understand the storage requirements for components we plan to track, and we should have a sense of what dashboarding tools would make our lives easier. We will examine these questions and others more closely in the context of our example MLOps pipeline.

Components we track

There are three types of components that we might care to track. They are parameters, metrics, and artifacts. Parameters are any inputs that impact how a modeling experiment is set up and executed. Metrics are the results by which we gauge performance of the modeling run. An artifact is a catch-all term for any file we want to log with the experiment, which may include model weights, figures, or text summaries.

The set of components tracked for an experiment should fully specify every aspect of the experiment you may want to assess in the future; you want to ensure there are no information gaps in the data you log. Still, some attention should be taken to avoid exploding data storage costs. For example, tracking a dataset ID tag as a parameter may be a better solution than logging an entire training dataset as an artifact.

The types of tracked components differ across different areas of an MLOps pipeline. The table below provides a brief discussion of how we might consider what to track at each major stage. Note we discuss potential tracked data in terms of images, as our example dataset is the Blood Images Dataset from Kaggle.

Discussion of elements to track in an MLOps pipeline.

Organization

When we run thousands of trials, organizing everything can get messy. Third-party MLOps software platforms that help with this problem have matured in recent years. Integration with custom project code only requires a few lines of code, and dashboards help sort through modeling trials easily. The best part is that all of the complexity of logging and database management is taken care of under the hood. Weights and Biases (W&B), MLflow, and neptune.ai are three well known services that share many model tracking features. While all three platforms are excellent, we explore W&B functionality for this example pipeline because setup is easy, the documentation is extensive, and they have a broad suite of visualization features.

Here we give a short introduction to how we implemented W&B into our example pipeline.

Weights and Biases set up

Getting started with W&B is simple. Once you sign up for an account on their website (it’s free for individuals!), you will be guided through the steps to download the W&B Python SDK.

A basic W&B implementation in Python looks like this:

Simple integration of W&B logging in model training.

Regarding the wandb.init() call above:

The project argument is the name for the overarching project. Multiple modeling runs with different configurations can be saved to a single project.
The config argument is a dictionary of model configuration values. The keys and values in this dictionary will be used to group and compare the various model runs within the project.
The name argument is the name of a single model run. If left empty, a random name is generated.

Additionally, any objects and files you log within the wandb_run will be saved to the given run name, under the given project, and will be viewable on the W&B dashboard.

Logging

Within a W&B modeling run, you can log any arbitrary object or file as an artifact. Keep in mind, however, that when doing so, the default behavior will copy the object to W&B cloud storage (GCS in the United States) This convenience works well for some use cases, but here we prefer to handle cloud storage on our own to avoid tight coupling with third party tools. Instead, we log reference artifacts to URIs in our own S3 buckets. In doing so, only the URI reference is saved as an artifact to W&B. This gives us better control over our stored data and increases security for tracking sensitive files.

Viewing the dashboard

W&B provides a nice dashboard for keeping track of logged artifacts and viewing modeling results. To navigate to the dashboard, go to wandb.ai/home and select your project. The first tab will show some training results for each modeling run within the project. For the dashboard below, we used the WandbMetricsLogger TensorFlow callback to track the recall, precision, and accuracy at each epoch.

Training and validation metrics are tracked for each run in a W&B project.

A second tab on the dashboard organizes these results into a table. This format allows you to easily sort and filter modeling runs based on hyperparameters, metrics, or other logged items.

W&B organizes metrics and characteristics of modeling runs in a table format.

These are just two of many nice features in the W&B dashboard. It also provides great visualizations for organizing hyperparameter searches, allows you to keep track of artifact data and metadata, generates reports with your results, and more. We recommend you spend some time getting familiar with all the dashboard has to offer!

Hyperparameter search

One powerful feature of W&B is its ability to track and visualize hyperparameter searches. You can use the W&B online dashboard to set up a hyperparameter search, or you can integrate W&B into your own search code with their SDK. To keep our example pipeline flexible, we integrate W&B logging into our existing hyperparameter search code.

During model development, we used the KerasTuner Python package to perform hyperparameter searches. The high level set up of a KerasTuner search looks like this:

Basic steps for running a KerasTuner hyperparameter search.

When calling tuner.search(), a new model is built and trained for each trial in the num_search_trials via a run_trial() method. We can write a custom tuner class that inherits from RandomSearch and overrides run_trial(). The custom method contains the logic to set up a W&B run.

Integrating W&B logging into KerasTuner hyperparameter search.

Now when we instantiate a WandBTuner and call its search() method, our custom run_trial() method will be used for the hyperparameter search. Then, W&B can access performance metrics and hyperparameters to create beautiful parallel coordinates charts like the one below.

W&B generates graphics like this parallel coordinates chart to visualize hyperparameter search results.

This image shows one of the many ways W&B gives you to track and visualize your model development. These features help you sort through many modeling trials and makes W&B a great tool for gaining deep insights on your ML model.

Once we have found the best model and made sure we can recreate its results, it’s time to deploy that model for use in the real world. We’ll discuss this process in the final blog in our series.

Machine learning at Edge Analytics

Edge Analytics helps companies build MLOps solutions for their specific use cases. More broadly, we specialize in data science, machine learning, and algorithm development both on the edge and in the cloud. We provide end-to-end support throughout a product’s lifecycle, from quick exploratory prototypes to production-level AI/ML algorithms. We partner with our clients, who range from Fortune 500 companies to innovative startups, to turn their ideas into reality. Have a hard problem in mind? Get in touch at info@edgeanalytics.io.