Machine Learning Platform!

Kunal Saini
The Startup
Published in
8 min readSep 11, 2020

This story will answer most of the questions that come to your mind while thinking of how to build a platform that can help you deploy your AI model overproduction or what all we need to look into a platform for the same.

Architecture
Components

Before we go into the details of the specific components of the above architecture and talk about the engineering side, let’s learn a bit about the training process.

Model Training Proccess

So, the model training process is majorly offline. The model training is mainly performed by Data Scientists and Statisticians. These people jot down the requirements, collect data via running various SQL queries over the databases, then comes the data cleaning and feature selection process. After the data is prepared they train a model that is suitable for the particular use case or business requirement, do the validation and testing. And finally, you have the model, to be deployed in production. The model is provided to the developers in various formats depending on the libraries/language used to train the model, say in form of a pickle file or PMML file or any other format.

Now as an engineer its out responsibility to come up with a platform on which we can deploy these trained models and in most of the cases run them in real-time, collect the input and output that goes in and comes out of the model and in some cases provided the feedback, retrain the model.

Now as promised let’s deep dive into the engineering side after we have the trained model with us.

Brief Overview

The above shows is a brief overview of how the whole platform looks like. In majorly every Machine Learning System we will see these four components.

  1. Client
  2. Pre-Processor Service
  3. Compute/Deployment Service
  4. Post-Processor Service

Before we go in-depth of each of the components and the role they play. There is another important process named The Feedback Loop or Online Model Retraining this is also a somewhat important part of the platform that you may or may not have, it depends.

FeedBack loop / Online training process with time helps to fine-tune the model so that it can perform well even over those scenarios where our model was not able to generate the correct results in the past. By adding this component we make our model evolve.

Q. Why do we have this, if we have a model with say 99% accuracy?

A. The accuracy of the model is over the past data that we already have. And our model learns the pattern out of that data. But when it comes to going live in production the data on which we need to make the prediction/inference can vary a lot form the data on which it was trained/tested/validated. So we adopt this feedback loop or online training mechanism.

Q. How it works?

A. Suppose we deploy version-1 of our model and we used to store all the results the model produced and the input provided to it. We collect all this data and by the end of the month/some time frame provide this to our analyst for the manual analysis. After the analysis, they find out that over certain kinds of input the model is not giving correct output and in the end, they come up with more training data that will help to correct the model. Then they upload this data and our service consume this data and retrain the model with it. The retraining works in the background while version-1 is still running. Once the retraining is done we update the version-1 with the version-2 and it comes live then.

Finally, let’s dig into all the four components mentioned above

Part-1: Client

The client can be any service that calls out the platform with the input parameters that are used to generate the input features which are feed to the model to get the output. Apart from the input the client also provides us with its information and the model/model Id that it wants us to run. And apart from the parameters the client supply to us in real-time, it can also demand us to get some input from other data sources at our end. Say if the model, client is using require user history, so for the client to fetch the user history in real-time is a time-consuming process. So here the client will require us to store the history of the user at our end. The same can be done with other data that can’t be served in real-time and possible to store, say the data that don’t change often/ static data and is required by the model for results computation. All the work including the conversion of input param by the client into input param/features feeds to the model is the responsibility of Pre-Processor Service.

Part-2: Pre-Processor Service

We have the connectivity of the pre-processor service with a cache, databases and it can subscribe to multiple topics of the Kafka. Kafka will have multiple producers and it will provide us data in real-time we can also insert analytics data into the stream as per the model requirements. The client can also upload the data in the database or say in S3 which later can be stored in a cache say Redis but this data will be mostly static for instance the user history as mentioned in Part-1. And In the end, all the logic required to convert the raw input and data into the input that is required by the model will reside in this service only. For instance, if we need to normalize some parameters or scale etc.

Note: We will discuss this in more detail in the next upcoming blog on Data Lake Design. Stay tuned!

Part-3: Compute Service

The whole-brain of the platform resides in this component and its most crucial component. This handles loading the model into the memory over the initiation, selecting the model-based of the request, running the model to get the results, storing the results with the input parameters, sending this for offline analysis in case manual efforts is required or performing the analysis online, getting the data for retraining the model after analysis, running the feedback loop; fine-tuning the model, updating the previous model with the fine-tuned one and again bringing that to work.

If we have our model trained using python we need to have support for python and all the libraries that we required to run the model. In case we have deep learning or complex model we might need servers with more power or say with GPUs. It all depends on what kind of model you want to deploy and what all support they need, how much latency you can handle, and what throughput you want. So since this part is important, do take in mind what kind of model you will have in the future, how many requests you will be serving, what delay you can expect, will the model will all be python based or we need to have support for other languages and libraries too so that your system scale well in the longer run.

An example regarding the offline analysis process is mentioned while we were explaining the feedback loop.

Now one of the sample online training is; say we run a test over the in which the model decides to recommend a product X to a particular customer or not. So let’s take 3 customers to say P, Q, and R

The Model decided to recommend product X to; P and Q but not to R.

And in the end what happened was:

  • Product X was recommended to P and he purchased that
  • Product X was recommended to Q but he didn’t purchase it
  • Product X was not recommended to R but he ends up purchasing that by searching for the same.

We were able to see this via checking the record in our booking database and everything was done without any human intervention. With all this analysis we fine-tuned the model so that next time it does recommend product X and product similar to X to R and don’t recommend such product to Q. And this is how our model ends up correcting itself from the mistakes in the past.

Note: This is the heart of the system. Stay tuned we will cover it with more details and will also provide you some hands-on over the implementation.

Part4: Post-Processor Service

This service maps the output by the model in the way clients expect. Say if the Model output a probability above 0.5 clients want true and if it’s below 0.5 the client requires false or say our model gives a recommendation score for some products and we need to combine this score with other parameters and finally provide the best k results to the client; all this logic is taken care of via this service.

Note: We have divided the whole system into four microservices, to have things loosely coupled, so that each service is responsible for its task and need not have any information regarding the other. But it all depends on how much complex your system is. In case the use cases are such that they don’t require us to have separate pre-processor and post-processor services we can club all in one. In case we want to have a separate service for the analysis and feedback loop part we can do that too, it all depends on your platform requirement.

Hope you enjoyed the article and got an idea or an overview of how we go about designing a whole Machine Learning system. I know this is kind of brief just to provide you with the helicopter view of the system. There is a lot of insights we need to go into when it comes to working/ building these systems. I will try to write more on it in my upcoming articles as this is just the start.

Please do provide you valuable feedback over this as either its humans or machines both require feedback to improve over time and evolve.

Thanks………see you soon.

--

--

Kunal Saini
The Startup

SDE-2 (MLE) at Amazon|| Ex-[Uber, Expedia Group] || Explore System Design and Architectures || Bachelors in Computers from IIITD