Building Operational ML Applications

Raj Srujan Jalem
Fasal Engineering
Published in
6 min readDec 20, 2021

Explaining Machine Learning in production, key considerations, and challenges

What is Operational ML?

Operational ML, ML Operations or as it is popularly known as MLOps now, is a set of best practices for businesses in helping data scientists and researchers to build Machine Learning Applications and run them successfully at scale.

MLOps meme by Ariel Biller

Currently, Machine Learning and AI have become an important part of our lives that we cannot think of a world without it. The lifecycle of ML projects is getting reduced day by day from research to production due to the high demand and the fast-changing world. But, the problem is in the gap between data scientists, who develop the ML models, and the operational teams who have expertise in day-to-day deployments of applications and building continuous integrations and deployments (CI/CD) pipelines.

Different components in MLOps

Deployments of ML Applications and developing CI/CD pipelines are quite different when it comes to ML Applications. I am going to try to go through some key components involved in building Operational ML Applications at scale. As we will be focussing on productization, we will not be talking about the lifecycle of Machine Learning experimentations.

Key components to be considered in MLOps

Data Management: As we all would agree, data is the most important component in any ML Application. Some of the core steps involved are:

  • Training Data: Collecting data from different sources, verifying the data before any processing, and setting up the right data structures for the data scientists is important to help them in doing their experimentations smoothly. This also involves processing these data from different sources and unifying them to a single data source in the required formats with the help of data pipelines.
  • Inference Data: Every ML Application requires some data for inferencing the trained models. This data needs to be pre-computed if possible and can be cached and used for inference. For eg., at Fasal, we have models running for microclimate predictions that require IoT sensor data and local forecast data. Therefore, they are always readily processed and cached for the applications as required.
  • Metadata: There is a lot of metadata generated from the training and experimentation of ML models like the accuracy metrics, feature set, etc., which would be very helpful in analyzing the training processes and evaluating the quality of the models and the data used. This data should be properly maintained and versioned otherwise comparing it with previous training processes would be difficult.
  • Prediction Logs: All the predictions from the inference servers along with their computation and runtime meta logs like computation and hardware usage should be logged. Proper dashboards and metric evaluations should be kept in place helping data scientists to do necessary research and tweaks in the model development and data processing steps.

Model Management: The second important thing is managing models. Different steps involved are:

  • Logging Model files: All models developed should be properly stored in a cloud or centralized storage accessible to all the other required teams for doing the sanity checks and using them in applications.
  • Model versioning: Along with logging, models should also be properly versioned which helps in maintaining clean training processes and also helps in communicating with other teams.
  • Model Format: There are a lot of libraries Data Scientists use to develop their models like PyTorch, Tensorflow, OpenCV, etc. These libraries have different system requirements and dependencies to be installed and maintained to which they are tightly coupled to. This would be quite difficult for the operations teams to keep a track of. Hence, converting models to a unified format specially built for inferencing like Onnx helps in reducing the application sizes as only a single dependency would be required. This also increases the inference speeds by 6–7 times.
Reference: OpenDataScience MLOps Tutorial

ML Application: Then comes the main component, building APIs and serving ML Models. This involves:

  • Model Size: Sizes of the models are very important when they are deployed for inference as it affects the app builds, server speed, memory consumption, etc. This should be carefully considered and optimized if required. Most importantly, models shouldn’t be pushed along with code and should be maintained separately, as mentioned in Data Management, and pulled only during the deployment process.
  • Model Endpoints: If we have multiple models then each model can be individually deployed as an API and their endpoints could be used inferencing in the front-facing API. This helps in reducing server build times and inference performance of the API.
  • Continuous Integration and Deployment: Combining all the points mentioned into a CI/CD pipeline would reduce the efforts of Data Scientists and Operations teams in deployments. CI/CD pipelines for ML Applications would include getting the right configurations, testing with inference data, containerization of ML models with the right version and model dependencies, configuring data caching, exposing endpoint, etc.

ML Operations At Fasal

MLOps at Fasal

At Fasal we work with different types of data mostly streaming from multiple sources and our IoT devices. So, managing data for the AI team is crucial to help them to continue their research and build products.
Here is the brief about how we manage our ML Operations at scale.

  • We build and maintain our data pipelines using AirFlow. As data sources, we store the training data in a SQL database as an offline feature store which is used by the teams for training models and then analyzing them. Then, we do data caching and store the required data for inference using Redis. Redis is an open-source data structure store, used as a database cache and message broker. We also use Redis as our online feature store.
  • All the models and experiments are tracked and versioned using MLFlow which is also used as a Model Registry. For more details on how we use MLFlow, please refer to one of our previous articles on it.
  • We use Onnx as a unified model format irrespective of the libraries used to train them.
  • We log all developed models in MLFlow and package or containerize our APIs using docker and use an Elastic Container Service to deploy and expose the services.
  • Once the services are deployed, we collect all the prediction data and develop dashboards to evaluate the prediction accuracies and overall model health.
  • Our Training Pipelines are automated based on a scheduled trigger or on the basis of data/model drifts. These pipelines are deployed using Kubeflow which helps in scaling and customizing the system and hardware requirements.
  • All the builds and deployment processes are managed for different production and testing environments using our CI/CD pipelines that are triggered on conditions directly from our code repository.

Conclusion

MLOps or Operational ML is a concept rather than some kind of a toolkit. It is still in development and a lot of open-source communities are contributing to making ML in production easy for data scientists and researchers. The end-to-end choice of tech stack and operations totally depends on the types of problems and solutions the company is working on and there is no fixed set of tools to adapt. Major cloud-service providers like Alibaba, GCP, AWS, and Oracle are among several that offer end-to-end services accessible. It’s still early days for off-the-shelf MLOps solutions and software.

We at Fasal focus on building efficient and scalable systems that are capable of analyzing the growing volume and variety of data as well as supporting the increasing requirements for AI in solving complex problems in agriculture.

--

--