Characteristics of ML CI/CD Pipelines
In this blog post, we will be going through the components of building and maintaining CI/CD pipelines for Machine Learning
A typical CI/CD pipeline is composed of:
[a] Build Phase / Pipeline which results in creation of ML artifact
1. Build the artifact
2. Persist the artifact
3. Sanity check / Smoke testing
4. Generate explainability report
[b] Deploy to test environment
1. Manual Validation of artifact
2. Execution of performance tests (computational, validation, etc)
[c] Deploy to Production environment
1. Canary or blue-green deployment
2. Full deployment
3. Release deployment
ML artifact
An ML artifact is comprised on the following:
- model code and pre-processing logic
- Hyperparameters and configurations
- Trained runnable model
- Environment variables (libraries, versions, environment variables etc)
- Documentation
- Code and data for validation
Considerations for Test Deployment
When deploying a model in test environments, ensure completeness of test cases. Additionally, the test cases should not only provide coverage of validation but also should be able to identify the source of failures.
Additionally, there are 2 modes of deployment relevant:
i. Batch scoring mode where entire data sets are processes like daily batches
ii. Real-time scoring mode where the data set is limited, curated sub-set with coverage across all required validation scenarios
Considerations for Production Deployments and Release
When deploying in production, we need to consider a few basic permutations
- Single model, Singe version deployed on Single server
- Single model, Single version deployed on Multiple servers
- Single model, Multiple versions deployed on Single server
- Single model, Multiple versions deployed on Multiple Servers
- Multiple models, Multiple versions deployed on Multiple Servers
Additionally, deployment methodology could be Canary vs Blue-green depending on the nature of the model and application encapsulating the model
Consider the above factors in your enterprise MLOps practice and CI/CD machine learning pipelines