Member-only story

River: Online Machine Learning in Python

A Fast and Cheap Approach to Update an ML Model in Production

Khuyen Tran
Towards Data Science
9 min readDec 6, 2022

--

Problem with Batch Learning

It is common for data practitioners to use batch learning to learn from data. Batch learning is the training of ML models in batch. An ML pipeline with batch learning typically includes:

  • Splitting the data into train and test sets
  • Fitting a model to the train set
  • Computing the performance of the model on the test set
  • Pushing the model to production

However, in production, the pipeline doesn’t end here. To make sure the model is robust when the input data changes, data practitioners also need to periodically retrain the model on the combination of the new dataset and the existing dataset.

As the data grows, training the model takes more time and resources.

Demo of batch learning (by author)

Thus, batch learning is not ideal when:

  • An application requires frequent model updates.

--

--

Towards Data Science
Towards Data Science

Published in Towards Data Science

Your home for data science and AI. The world’s leading publication for data science, data analytics, data engineering, machine learning, and artificial intelligence professionals.

Responses (6)