Member-only story
Patterns for Reproducing Machine Learning Features
How to reproduce flows in machine learning applications through reproducible features
Reliability in a typical software application demands deterministic output. It is difficult to guarantee the same for Machine Learning (ML) applications that live in the realm of randomness and uncertainties. However, it is possible to ensure a big part of the application by making them reproducible, which is a precondition for reliability. Everyone knows about the random seed that should be used consistently everywhere. However, beyond that, there are other components that should be fixed during the training, serving, and evaluating stages of the application.
In this article, we explore design patterns for reproducibility in machine learning. The inspiration for the design pattern came from the book Machine Learning Design Patterns. I strongly recommend the book for ML practitioners.
The design patterns are:
- Transform: how features can be kept consistent across different stages, such as training, serving, and evaluating for simple machine learning applications
- Repeatable Splitting: how features are split among training, validation, and test datasets to ensure leakage-free training irrespective of the data volume
- Feature Store: how features can be ingested in a reproducible manner and reused across different machine learning jobs