TDS Archive

An archive of data science, data analytics, data engineering, machine learning, and artificial intelligence writing from the former Towards Data Science Medium publication.

Member-only story

Opinion

River: the Best Python Library for Online Machine Learning

The “sklearn” for machine learning on streaming data

6 min readJul 26, 2021

--

Image by UnboxScience at Pixabay

Conventional machine learning algorithms, such as linear regression and xgboost, operate in “batch” mode. That is, they fit a model using a full dataset in one go. Updating that model with new data requires fitting a brand new model from scratch using both the new data and the old data.

In many applications, this can be difficult or impossible! It requires all data to fit into memory, which isn’t always possible. The model itself can be slow to re-train. Retrieving older data for the model can be a big challenge, particularly in applications where data is continuously generated. Storing historical data requires data storage infrastructure with the capability of returning the full history of data quickly.

Alternatively, models can be trained “online” or in “streaming” mode. In this case, data is treated as a stream or sequence of items that are passed to a model one by one.

Incremental learning, continual learning, and stream learning are preferred terms to “online learning” because searches for “online learning” largely point to point to educational websites.

--

--

TDS Archive
TDS Archive

Published in TDS Archive

An archive of data science, data analytics, data engineering, machine learning, and artificial intelligence writing from the former Towards Data Science Medium publication.

Alexandra Amidon
Alexandra Amidon

Written by Alexandra Amidon

Data scientist working in the financial services industry

Responses (4)