Stuff that Every Machine Learning Engineer should know…

9 min readMar 22, 2023

I’ve seen people mixing terminologies like model retraining with online and continual learning and I know how big of a pain this can become when developing tuning strategies post-deployment.

Nowadays, the definition of a robust machine learning model has changed. You can’t expect to train a model and then leave it in production. It might work for a while but eventually, there will be a decay in the quality of predictions.

The kind of machine learning where you train your model on a batch of historic data to optimize a cost function and make predictions on the future unseen data is called offline learning. It merely means that the model is only trained with the data it was provided and could only predict a similar type of data. This is the de-facto standard of how a model is trained and deployed in businesses, but again, we can’t rely on this kind of machine learning in a scenario where the data is flowing in continuously with our model fighting like hell to adapt according to the new trends.

There are a lot of approaches to tackle this issue like active learning, incremental learning, continual or lifelong learning, and online learning. Many refer to these terms as synonyms and there is always some confusion lurking around these terms. Let’s try to understand each concept.

Online Learning:

Suppose you have a model running in production which has to deal with a continuous stream of data. It has to update estimates as new data point arrives rather than waiting till the end to get the complete batch of data. This is where we apply online learning.

Here the machine learning model is ingesting samples of real-time data one observation at a time. This makes it significantly efficient in terms of time and space. So, today we have tons of data everywhere, and making sense of it requires new techniques and solutions.

Online learning is best in the scenario

Where our data samples are available to us periodically
Where the probability distribution of samples is also expected to change over time.

For example, having a personalized shopping experience where the model constantly learns real-time user behavior in an attempt to provide a personalized shopping experience, is crucial for every customer-centric business model.

But this does not in any way mean that online learning, in general, is better than offline learning.

Let’s compare both of these learning techniques.

Model Training:

During offline learning, the model is looking out for the global minimum cost function while training on the data. The model is constantly trained, adjusting its weights and parameters until it is robust enough for deployment.

During online learning, the adjustment of the weight happens based on the current example being presented. This adds to its high adaptability nature since the model is continuously seeing a new wave of data and adjusting according to it.

Computation time:

Now, remember that the offline trained models are not at all designed to deal with data that might go rogue at some point. It only expects what it has seen before. Offline training is much faster since the dataset is only used once to adjust the weights and parameters. Once it is trained, we do not need to look at our training data again. However, when we need to encounter big data streams, it becomes quite time-consuming to find our way around them.

In online learning, the model is ready to obtain and tune the model as per the incoming stream but again, be prepared for more cost and resources (clusters) to train the model constantly.

Use in Production:

Well, given the fact that our online trained model is learning on the arrival of new data points, these models are kind of hard to manage in production. Any changes in the pattern of data (concept drift) will affect the overall performance and predictions of the model.

So, Online training is not at all similar to leaving our model on autopilot.

You still need to monitor the training in case of inevitable pattern drifts in incoming data.

Offline learning, however, with the model being constant after the deployment stage (with the same types and pattern of data), it is easier to maintain the whole network or cluster with minimal supervision and control.

All in all, different scenarios require different approaches. Often, offline learning models are much more straightforward to deploy and manage but less adaptable to the changes in data. Online learning models are more complex in the sense that they require more effort and time since the new stream of data is continually being pushed. That requires all the preprocessing of data which will take up more time and cost.

Few Important Points about Online Learning:

In online learning, we are making one pass on our data; these algorithms are typically much faster than offline learning as most of the offline learners are multi-pass.
Whether it is online or offline we do not consider the training data once it is used, here in online learning the advantage is that data is in sequential order and available for instance, so we do not need to store them which results in smaller memory usage.

The catch:

As said earlier it is difficult to maintain in a production environment as data points tend to change continuously and there is a high chance of mismatching the pattern and distribution of data. If there is a major network latency issue or server goes down or any other ambiguity can result in the complete failure of the project.

It can be difficult to see whether our learner is behaving correctly or not on an automatic basis. Also, hard to diagnose whether the algorithm is misbehaving or not.

Now comes the big question.

Does online learning retain the old concepts or knowledge, when it comes across new data?

Sadly not.

This is a kind of human-like behavior to learn new information while retaining and building up on the old one. Here comes continual or lifelong learning to our rescue.

Continual Learning:

This concept is borrowed from our human way of learning. We learn effectively with a few examples and in the dynamic and open world or environment in a self-supervised manner because our learning is also very much knowledge-driven: the knowledge learned in the past helps us learn new things with little data or effort and adapt to new/unseen situations. This self-supervised (or self-aware) learning also enables us to learn on the job by interacting with others and with the real-world environment with no external supervision. Continual Learning aims to achieve all these capabilities.

Continual or Lifelong Machine Learning learns continuously, accumulates the knowledge learned in the past, and uses it to help future learning and problem-solving. In the process, the learner becomes more and more knowledgeable and better and better at learning.

However, the current dominant ML paradigm learns in isolation given a training dataset, it runs an ML algorithm only on the dataset to produce a model. It makes no attempt to retain the learned knowledge and use it in subsequent learning. Although this isolated ML paradigm, primarily based on data-driven optimization, has been very successful, it requires a large number of training examples and is only suitable for well-defined and narrow tasks in closed environments.

And yes, this is a step towards general artificial intelligence.

But enough of this. What we really need to know is how this continual learning model will perform better than an online learning model in production.

Let’s look at an example of machine learning with closed-loop continual learning.

The diagram above illustrates what a machine learning pipeline looks like in the production environment with continual learning applied. You’ll notice that the pipeline looks much like any other machine learning pipeline. Let’s break it down:

⦁ We must have the data, some sort of validation. This could include tests or internal benchmarks such as determining the quality of data. It could also be pre-processing that you operate.

⦁ Next, in the pipeline is AutoML. AutoML in continual learning is a very important part of the pipeline and is similar to the training step in a typical machine learning pipeline.

⦁ After training, you’ll do some model validations to test the models and make sure all of them are working well. Here you can also pick the best one and deploy it to the production environment.

Thus far, the pipeline looks like a classic machine learning pipeline. In order to apply continual learning, we add monitoring and connect the loop back to the data.

With AutoML, it is especially crucial to track not only the models in production but the entire process. If you put your machine learning on autopilot, you have to make sure that everything is tracked and managed.

You need to track everything from: the kind of algorithm you are using, what the hyperparameters are, the kind of computing being used, the memory consumption, metrics, accuracy, and so on. Let’s say you are training 60 different experiments today. The data being tracked might be used for next week when trying to redeploy and retrain the model.

The challenge with Continual Learning

One major challenge with continual learning is how to deploy new models to the same environment without negatively affecting users' experience and maintaining high accuracy.

Machine learning systems typically require various data-wrangling steps prior to data being ingested into a model. These wrangling steps include cleaning the data and featuring the data. Unfortunately, data wrangling is a very time-consuming and typically manual process. Data wrangling often takes upwards of 90% of a developer’s time. The manual nature of data wrangling becomes an issue when we consider a system that is meant to continuously adapt to new data. In such a situation, there is no room for a time-intensive manual process of data wrangling!

Other Kinds of Learning:

⦁ Active Learning — Models that are trained with few active labels to achieve high performance

⦁ Incremental learning — Well technically it is similar to continual learning where input data is continuously used to extend the existing model’s knowledge i.e., to further train the model. It represents a dynamic technique of supervised learning and unsupervised learning that can be applied when training data become available gradually over time or its size is out of system memory limits.

Model Retraining vs Different Learnings (Online, Continual, etc.)

Model Retraining can be understood easily with the concept of not changing your code at all but the input data that was used to train the existing model.

During model retraining, we do not need to make any changes in the code of our machine-learning model. The hyperparameter settings, the way it inputs data and makes predictions and its evaluation metrics remain the same but the only thing we are interested in is changing the input data. This is because we only want to capture the change in data distribution. This way we retain the model in production while improving its accuracy.

Now, remember, if we want to implement a new state-of-the-art technique like online or continual learning, we need to make changes to our code, which will output a completely different model, which will be tested again before deployment.

OK So, now we have an idea about various kinds of learning. No one learning technique is better than the other. It’s just a matter of requirement in production. Hope this was helpful and see you soon.

Until then,

Happy Learning😊

Stuff that Every Machine Learning Engineer should know…

Written by Hrithik Rai Saxena