Member-only story
River: Online Machine Learning in Python
A Fast and Cheap Approach to Update an ML Model in Production
Problem with Batch Learning
It is common for data practitioners to use batch learning to learn from data. Batch learning is the training of ML models in batch. An ML pipeline with batch learning typically includes:
- Splitting the data into train and test sets
- Fitting a model to the train set
- Computing the performance of the model on the test set
- Pushing the model to production
However, in production, the pipeline doesn’t end here. To make sure the model is robust when the input data changes, data practitioners also need to periodically retrain the model on the combination of the new dataset and the existing dataset.
As the data grows, training the model takes more time and resources.
Thus, batch learning is not ideal when:
- An application requires frequent model updates.