TDS Archive

An archive of data science, data analytics, data engineering, machine learning, and artificial intelligence writing from the former Towards Data Science Medium publication.

Member-only story

Patterns for Reproducing Machine Learning Features

How to reproduce flows in machine learning applications through reproducible features

Misbah Uddin
TDS Archive
Published in
5 min readMar 13, 2022

--

Reliability in a typical software application demands deterministic output. It is difficult to guarantee the same for Machine Learning (ML) applications that live in the realm of randomness and uncertainties. However, it is possible to ensure a big part of the application by making them reproducible, which is a precondition for reliability. Everyone knows about the random seed that should be used consistently everywhere. However, beyond that, there are other components that should be fixed during the training, serving, and evaluating stages of the application.

In this article, we explore design patterns for reproducibility in machine learning. The inspiration for the design pattern came from the book Machine Learning Design Patterns. I strongly recommend the book for ML practitioners.

The design patterns are:

  • Transform: how features can be kept consistent across different stages, such as training, serving, and evaluating for simple machine learning applications
  • Repeatable Splitting: how features are split among training, validation, and test datasets to ensure leakage-free training irrespective of the data volume
  • Feature Store: how features can be ingested in a reproducible manner and reused across different machine learning jobs

--

--

TDS Archive
TDS Archive

Published in TDS Archive

An archive of data science, data analytics, data engineering, machine learning, and artificial intelligence writing from the former Towards Data Science Medium publication.

Misbah Uddin
Misbah Uddin

Written by Misbah Uddin

Group Product Manager: AI, Analytics and Data @H&M. Opening little boxes, one at a time

No responses yet