Characteristics of ML CI/CD Pipelines

Published in

Aruva.io Tech

2 min readApr 3, 2021

In this blog post, we will be going through the components of building and maintaining CI/CD pipelines for Machine Learning

A typical CI/CD pipeline is composed of:

[a] Build Phase / Pipeline which results in creation of ML artifact

1. Build the artifact
2. Persist the artifact
3. Sanity check / Smoke testing
4. Generate explainability report

[b] Deploy to test environment

1. Manual Validation of artifact
2. Execution of performance tests (computational, validation, etc)

[c] Deploy to Production environment

1.  Canary or blue-green deployment
2.  Full deployment 
3.  Release deployment

ML artifact

An ML artifact is comprised on the following:

model code and pre-processing logic
Hyperparameters and configurations
Trained runnable model
Environment variables (libraries, versions, environment variables etc)
Documentation
Code and data for validation

Considerations for Test Deployment

When deploying a model in test environments, ensure completeness of test cases. Additionally, the test cases should not only provide coverage of validation but also should be able to identify the source of failures.

Additionally, there are 2 modes of deployment relevant:
i. Batch scoring mode where entire data sets are processes like daily batches
ii. Real-time scoring mode where the data set is limited, curated sub-set with coverage across all required validation scenarios

Considerations for Production Deployments and Release

When deploying in production, we need to consider a few basic permutations

Single model, Singe version deployed on Single server
Single model, Single version deployed on Multiple servers
Single model, Multiple versions deployed on Single server
Single model, Multiple versions deployed on Multiple Servers
Multiple models, Multiple versions deployed on Multiple Servers

Additionally, deployment methodology could be Canary vs Blue-green depending on the nature of the model and application encapsulating the model

Consider the above factors in your enterprise MLOps practice and CI/CD machine learning pipelines