Why Real-Time Machine Learning will be the Buzzword of 2023

Published in

Geek Culture

6 min readFeb 2, 2023

Real-time Machine Learning… Expect to hear this buzzword a lot in 2023 and beyond because it has the potential to unlock a huge amount of value for different types of businesses. But how can you be sure the system you’re building will achieve your objectives? Are you setting it up in the right way to maximize click-through and engagement? To help answer those questions, we’ve brought together all the essentials you need to know in one easy-to-digest post! While we’ll use a social network in our examples, what we’re going to outline applies to all products — from FinTech to FitnessTech!

The time-based anatomy of Machine Learning systems — featuring cats and dogs

So, of course, you already know what Machine Learning is, that it’s an area of Artificial Intelligence (AI) and computer science that combines data and algorithms to imitate how people learn, so that the resulting model becomes more and more accurate over time. You’ll probably also be aware of real-time Machine Learning, where the model is updated in real-time based on live data, to continuously improve the results. Sounds great, doesn’t it? And it is. But to achieve the results you want, you need to make sure it’s set up the right way.

That means ensuring a successful sequence of events. So, you start with collecting the data, training your model on that data, then using it to make predictions as the user interacts with your app.

For example, let’s say that the long-term data shows that users who like a post about cats are highly likely to like a post about dogs. As the user goes into the app and likes a post about cats, the Machine Learning model will recommend serving them a post about dogs.

Let’s pause briefly and look at what happens when only a batch Machine Learning model is used instead. Continuing with our cats and dogs example, the batch system will calculate all the posts the user should see upfront, based on their previous interactions. So, if they haven’t liked a cat post yet, they won’t see a post about dogs. Unlike the real-time system which uses the batch-trained data but can also take into account the real-time data — i.e., that the user has just (in the same session) liked a cat post — and then show them a dog post to maximise engagement!

If the system cannot respond in real-time, you are “leaving engagement on the table”, as new information about the user isn’t being used during the same session. Compare a LinkedIn feed (pre-created when you log in) to a TikTok feed that calculates the best next video for the user, based on their interaction with the current one. Without real-time Machine Learning, the potential results — and levels of engagement — are much poorer. And your cat- and dog-loving users are less likely to stick around!

Powering up engagement

Let’s get into more detail. In brief, real-time Machine Learning covers:

Input data: long (30+ days’ user history) and short-term (session)
Model training: uses just long-term data (or sometimes all data)
Inference: turns long and/or short-term data into a prediction

So, for a social platform, these predictions can power a personalized feed with follow recommendations, “content you missed” emails — and much more.

For your own product, think of all the touch-points you have with your users or customers — and the value of higher click-through rates!

Inference models

Next, we look at how to transform all that lovely data into valuable predictions. The inference, which, as outlined above in point 3️, turns the data into a prediction, usually involves two types of models:

Retrieval model

This model starts off the inference by generating candidates quickly, for example, a neighbour search in vector embeddings. You can use Redis, Pinecone or another vector database for this.

Ranking model

This model completes the inference by re-ordering the candidates to maximize an objective — e.g., a model that predicts the read-through rate for articles.

Bonus tip: Select the right retrieval and ranking strategies by tracking their results for each user, switching to the best performers automatically. The formal tech term for this is a “Multi-Armed Bandit”!

Getting to the next level with real-time Machine Learning systems

The next step is to understand the different levels of how real-time your Machine Learning system is and how this can impact your results. These are:

1. Batch inference

At this level, you can run a daily pipeline that produces “follow” recommendations for all your users. Doing that all at once is efficient and can be easier to evaluate. However, if many of those users don’t actually show up, you are performing a lot of expensive and useless computation.

2. Real-time inference

Instead of producing follow recommendations for *all* your users in a batch pipeline, you just use the batch pipeline to create a summary of your data. Then when the user visits your app, you use this summary and long term behaviour of the specific user to calculate the “follow” recommendation right in that moment. Suddenly, the latency and availability of your inference server matters!

3. Vector-based retrieval with short-term history

You still have a batch pipeline, but you use it just to represent all your content with vector embeddings. A vector embedding is a projection of your content into a mathematical space that reveals its meaning and structure, by putting content about similar topics close together and unrelated content far apart.
Let’s say your user logs in and interacts with content. You look up your pre-computed vector embeddings for the content the user liked in this session and combine them into a user preference vector.

Then, you retrieve content similar to the user preference vector, rank it with a static model and voilà — a real-time personalized feed is born!

4. Vector-based retrieval, with complex ranking

To improve ranking, you collect more data from the current session (e.g. time of day, session duration) and combine it with current data about content, such as what is trending right now. You can use Quix or Decodable to perform these aggregations on your real-time data stream.

Then you feed all of this data into your ranking model, along with the candidates retrieved as before… and see the engagement take off!

Many startups already use real-time ML of this kind. For example, Superlinked achieves real-time Machine Learning without all the parts of the system retraining continuously (as long as fresh data flows into the inference model).

The takeaway

Real-time Machine Learning will be big this year — and is set to get even bigger. That’s because it’s a game-changer for engagement and relevance. But to reap the rewards, you need to make sure you’re taking the right steps and following the most effective approaches. If you do this, you can expect to see engagement flourish.