Data to Deployment: Crafting a Robust Machine Learning Pipeline

A journey through the inner workings of production-grade machine learning pipelines

7 min readDec 30, 2023

Ever wondered how machine learning systems work in production?

This is the first article in a series of articles I am going to publish on production-grade ML systems. This series, which I’m dubbing “Production ML 101”, will take you on a journey through all the core aspects of building robust real-world ML systems.

In this article, we will thoroughly discuss the 3 stages of the ML Pipeline:

Data Ingest
Model Build
Model Serve

Data Ingest

The Data Ingest stage involves collecting, cleaning and analysing the dataset which will be used to train the ML model.

Data Extraction

Data Extraction means obtaining data from a raw data source. You can think of this as “loading the data into memory”.

The wonder of the modern era is not only the sheer amount of data available to us, but the multitude of data modalities.

Some of the most common data modalities (there are lots more!)

The extraction process for each of these data modalities will vary significantly, but the core idea is to pool the data we need into a centralized location such as a database, so we can easily analyse, clean and process it.

Data Analysis

Understanding the data is a primary objective of the ML Pipeline. If Data Extraction means connecting to someone on LinkedIn, Data Analysis means having that first conversation. Careful and intentional data analysis not only gives us insight into the nature of the dataset, but it may also inform our decisions regarding the design of the ML model we build later.

Data analysis can include:

✅ Plotting graphs and distributions
✅ Discovering correlations
✅ Calculating simple statistics

Data Preparation

After analysing the data thoroughly, it needs to be further processed and cleaned before it can be fed ino the model.

The specific data preparation requirements depend on the modality of data being processed, and the objective of the ML system. Here are some data prep tasks for some common data modalities:

✅ Text data: Removing punctuation, tokenization.
✅ Visual data: Rescale, resize, flip, add noise.
✅ Audio data: Converting raw audio clips to spectrograms.
✅ Tabular data: Filling in missing values, removing outliers, scaling, one-hot encoding, engineering new features.
✅ Time series data: Converting time series into sequential windows.

During this stage, we may also want to apply transformations that apply to the specific problem we are solving. For example, we would add random noise to a dataset of images if we wanted to train an image denoiser.

A common issue with labelled data is imbalanced classes. This is when one class is highly under-represented compared to another class, causing the model to focus less on the under-represented class. Solutions to this problem include oversampling (the minority class), undersampling (the majority class), or a mixture of both.

Our data is ready. Now, we need to design and iterate on the ML model that will be learning from this data. The Model Build stage is heavily dependent on the Data Ingest stage, because poor data processing will lead to poor model performance. Therefore, having a robust Data Ingest pipeline is fundamental to the success of an ML model.

Model Build

The Model Build stage involves iterating through different model architectures, hyperparameters and configurations to achieve a desired performance, often while trying to optimize a specific metric.

Choosing an ML model

Different problems call for different architectures, and the architecture you use is highly dependent on the modality of the data which is used for training.

Popular model architectures spanning various use cases

For most business applications, the general course of action is to find a pretrained model and fine-tune it with your dataset. You rarely need to design a brand new architecture from scratch (unless you’re solving a really niche problem).

Iterate, iterate, iterate

Training an ML model, especially for a non-trivial problem, is a highly iterative process, and you may be working on an idea for days on end, only to find out that it doesn’t improve the model’s performance. That’s okay! Just keep tuning the model and battle-testing new ideas, and eventually you’ll find an optimal solution.

There exist some strategies for systematically improving model performance, including how to specifically target bias and variance. I’m planning on covering these in a future article, so follow me to stay updated :)

Some common tasks to do while iterating on a model include:

✅ Testing the model on out-of-sample data to test its ability to generalize. Can we trust the model’s predictions? Is it overfitting?
✅ Assessing the model using test metrics and by evaluating the cost function.
✅ Based on these results, the model can be fine-tuned (e.g. hyperparameter tuning) to improve evaluation results.

Along with hyperparameter tuning, trying out different model architectures can lead to signficant performance boosts. It may also be helpful to tweak the hardware configuration (e.g. use distributed training) to speed up the model training.

Validate the model for business needs

In production environments, models often need to meet certain business-defined requirements.

For example, a common test is to look at performance by slice. This test ensures that model performance is not biased towards a particular feature segmentation, such as a particular country.

Model Serve

The Model Serve stage is concerned with deploying a trained model and making it accessible to whoever needs to use it.

Model Registry

A model registry is a centralized tracking system which stores lineage, versioning, and related metadata for published machine learning models.

A model registry contains information such as:

✅ Who trained and published the model?
✅ Which datasets were used for training?
✅ What evaluation metrics were used?
✅ When was the model deployed to production?

Prediction Service

To make the trained model accessible, a prediction service needs to be deployed.

The type of prediction service deployed is highly dependent on the problem you are trying to solve. Generally, there are 3 scenarios:

✅ Edge/mobile prediction: Store the trained model in the app/website bundle and access it directly from the application code. Applicable for projects that require low latency and when the model size is relatively small.
✅ Dynamic serving: Store the model on a server or in the cloud, and expose an API or build a microservice that serves as the face of the model. Example use case is chatbots.
✅ Static serving: Make all possible predictions in a batch. Write these predictions to a table and store as a cache/lookup table. Example use case is recommendation systems, where recommendations can be precomputed for each user and stored for later retrieval.

There’s a lot more we can talk about regarding static serving and dynamic serving. This will be the focus of the next article in the Production ML 101 series, so stay tuned :)

Monitoring

We’ve deployed our model. Everything’s done, right?

Well, it turns out that the environment in which the model operates is subject to constant change. If the data used to train the model becomes outdated and no longer represents the live incoming data, the model risks becoming stale, resulting in poor performance on current data.

Take, for instance, a recommendation model for an ecommerce website, which is subject to changing customer preferences. Without adapting to these changing preferences, the model may persist in generating recommendations aligned with outdated customer tastes, rendering them irrelevant and unhelpful.

Even after deployment, models need to keep up with their environment.

The first step in addresssing model staleness is detecting it. Regularly assessing the performance of the model on live data through a robust monitoring system allows for the detection of any degradation in its effectiveness. The outputs of this monitoring process can feed into the Data Extraction component, which could serve as a trigger to execute the pipeline, or gather new data.

Model staleness has a solution: Dynamic Training, which I will discuss in the next article in the Production ML 101 series, so stay tuned :)

Final thoughts

In this article I shared some of the fundamental components of any production-grade ML pipeline, however it should be obvious that the specific processes involved vary for each application. The ML Pipeline, like any system, should constantly adapt to different data modalities, business requirements and changes in the environment.

As my sign of thanks for reading this article, I’ve prepared an illustration that shows the entire ML Pipeline. Feel free to download it (and any other diagrams from this article).

Stay tuned for the next article in the Production ML 101 series, where I’ll share insights on important design decisions regarding the ML Pipeline, including training design decisions and serving design decisions.

Follow me on Medium and LinkedIn for more high-quality articles about tech and AI.

Have an absolutely joyful day 💖.

My previous articles: