COMPREDICT Platform Architecture —Part 1: AI Core & ML Framework

Published in

COMPREDICT

9 min readFeb 5, 2020

We at COMPREDICT, provide many services to our partners in the Automotive Industry. These services range from in-car software, through managing data lake, to developing and serving AI models in the cloud. This series of posts will discuss the architecture and technology stack that allow us to accept streamed data, train models and serve them inside a car or in our cloud.

Overview

Our software-based approach, the COMPREDICT Platform, predicts component failure, personalizes maintenance plans, enables lightweight design and optimizes the development process for automotive components.

The picture above describes our complete tech stack and the basic flow of our software, called COMPREDICT Platform.

In in-car software, we provide embedded load and lifetime monitoring models to compute usage profiles and assess the lifetime of the components by using virtual sensors relying on sensor fusion and Neural Networks. In a nutshell, our embedded algorithms process and enhance the standard data available in every series-production car (e.g. CAN signals), in order to get access to durability-related information of the components. Once processed, we gather the data generated from the in-car software in an aggregated way by sending the data on a regular discrete schedule (preferred way for now), or in a real-time way by streaming time series. All steps of the in-car software are purely software-based, no additional hardware is required.

In data lake, we store all data we have (generated from our own vehicles, project-related data, data streamed or sent to us by our partners). Then, we apply ETL jobs on these data to transform them into useful information and use them for inference, training or simply store them in our data warehouse.

The intelligence unit in our platform is AI Core our algorithm hub, where we develop and serve all our models and expose them by a universal API. The AI Core administrates our algorithms and models, executes training processes when new data is available, and manages tasks, queue and priority.

Finally, to relieve the customer from the hustle of connecting to the AI Core, we provide an user interface that enables data visualization and stores the algorithms’ results on our web server along with several additional features tailored specifically for automotive engineers, fleet managers or drivers. We call it COMPREDICT Analytics.

Each block will be discussed in depth in different blogs. For this post, I will introduce our Machine Learning Platform that we use to develop our models and how it interacts with our ML-as-a-Service platform, AI Core, where we will walk through in depth about its architecture and workflow.

AI Core

Our main assets are the state-of-the-art, patented and patent pending models that tackle different engineering domains in the automotive industry (software sensing, design optimization, monitoring and predictive maintenance, simulation tools, testing programs). These models are developed using various tools (Tensorflow, C++, Matlab, pyTorch, etc.). AI Core gathers all our developed algorithms in a single centralized hub and securely exposes these models to our internal team and external customers.

Additionally, AI Core is capable to run algorithms synchronously and asynchronously by dispatching long term jobs to a background queue system.

Technology Stack

AI Core uses the following technologies:

Docker for local and production environment.
Python Django as web framework and Django Rest Framework as an API engine.
Celery for asynchronous tasks that uses RabbitMQ as a queue system.
Flower for queue monitoring.
Postgres database.
Dask to offer distributed training for heavy machine learning related algorithms.
Python Boost and Matlab Runtime to run C++ and Matlab models respectively in python.

Applications

Django hosts two applications:

Algorithms: Deals with serving and hosting Machine Learning & Analytics Models.
Users: Manages organizations, users, activities and permissions.

Each application has its own API for connecting and requesting. For each API, we provide SDKs in different languages for fast integration.

Algorithms application offers a base class abstract Model that provides convenient methods for serving models. These methods are overrode by sub-classes that are specialized for a specific model type. Moreover, there are five different types we support:

C++ Model.
Fit/Unfit ML and NN models.
Matlab Model.
Hybrid Model (An ensemble of the above models).

For each served model, you can specify the evaluations that you would like to apply to the the predicted values. This can be directly done in the API request by including the parameters of the evaluation method. For instance, most of our virtual sensor models for mechanical components are evaluated by using a Rainflow-counting method, which is standardly implemented in our platform. Similarly, each served model is versioned, a client can request earlier versions as desired.

The API’s response solely depends on the type and the task of the model; it might require short amount of time to process. In this case, the response is sent directly to the client, as shown in the figure on the left.

On the other hand, the model may require plenty of time to process, hence, the request will be dispatched to the queue and a job id will be given to the requester to query the results later. Conversely, the requester can specify a callback endpoint where AI Core will send the results to once processed.

API Request dispatched to the queue and response is queried later or POSTed to callback URL.

Each model has different inputs, outputs and evaluations. Upon listing the algorithms or getting a specific one, you can identify the following:

The features that are mandatory for the algorithm and the expected output.
The file formats allowed (JSON, CSV, Parquet, tfrecords).
Whether it will be processed directly or dispatched to the queue.
List of the evaluations that the user can use to evaluate the predictions.

Privacy and Security

At COMPREDICT, we value the privacy of the users. Therefore, AI Core doesn’t store any data sent by the user. However, when the request is escalated to the queue, the results will be locally stored for maximum of three days which gives the user the benefit to query the results at later point. This might cause an issue for certain users that have strict requirements to share the input/output to any 3rd party organization. In order to overcome this issue, we allow the users to configure the API to encrypt the results prior to storing them. The results can then only be decrypted by a key held by them. Likewise, we allow the users to send us encrypted input data, so that the data transit is secure in both directions.

The encryption mechanism used is the widely implemented RSA algorithm. RSA is a symmetric cryptographic algorithm that requires two different keys; a public key that anyone can hold and use to encrypt the data, and a private key to decrypt the data and should always be kept private. In addition, we use PKCS1 OAEP padding scheme which semantically secure under chosen plaintext attack.

However, once received, we need to decrypt the data in order to run training or inference tasks. After the task is done, the data are then re-encrypted before transfer. To overcome this decryption and re-encryption steps, we are conducting many researches, in collaboration with TU Darmstadt, in the area of developing end-to-end private machine learning models, like using Partial Homomorphic Encryption for training and inference. However, this comes at a cost of jeopardizing the performance of the model.

In another note, the requests are authenticated using Bearer Authentication. Furthermore, not all algorithms are exposed to the users, each user is configured to access limited list of algorithms. In addition, we use throttling to control the number of requests per user. Based on their subscription option, we can increase the amount of requests they can send per minute and per day.

Deployment

COMPREDICT Platform in general, and specifically AI Core, follow the design principles of microservices architecture. These services are managed and orchestrated by Kubernetes. The reason behind this choice is that it allows us to distribute the application load and ensure stability with replicable and scalable services interacting with each other. For instance, this helps us to separate the loads between different organizations by creating different queues for each one. Hence, for each newly created organization in the dashboard (Managed by Users app), Django will trigger RabbitMQ to create a new queue, and workers are spawned to specifically serve this queue. This helps us to track how much resources each organization consumes. Consequently, we can then adapt the subscription service, if the customer requires more loads and and wants us to scale up the workers of this queue.

Each queue is prioritized. Moreover, a request can be processed according to a given priority, based on the requester and the requested algorithm. Being that, a user with more permissions can have his/her requests processed faster than a normal user in the same instance.

Depending on the requested training or inference tasks, our customers may send us huge files for processing (i.e. GBytes of data). Moving these files back and forth between the services is costly. This is mitigated by sharing a scalable, on demand space between the containers. Furthermore, once the data is uploaded, the file will be temporarily stored in the shared space until a worker finish processing it.

Currently, AI Core is deployed in AWS using the following services EKS, RDS and EFS. However, we recently became partners with Open Telekom Cloud and Microsoft Azure, and we are assessing their capabilities to finally choose which one is best for hosting the solution or parts of it.

Machine Learning Platform

The motivation behind building a standard tool for building and developing ML and NN models is that there was no reliable, uniform and reproducible pipelines among the data science team.

Additionally, as we have been working in the last years with different OEMs for various tasks, we were able to identify the best models and architectures that work for specific cases in Automotive Industry. Therefore, we have centralized and built different libraries to:

Optimize the best models for the task and ease the ability to do transfer learning for similar tasks,
Analyze and process signals and time series.
Analyze fatigue of materials.

This platform is connected to our Feature Store to help our team query the necessary data for the specified task.

Technology Stack

The platform is a combination of containerized open-sourced frameworks that are further extended to meet our needs:

MLFlow for tracking experiments and storing model’s artefacts.
Postgres to store the results of the experiments’ details.
ML-workspace for managing and standardizing the libraries and tools that a data scientist would need.
ML-hub for spawning and managing the data science team environment.

ML-workspace is extended to spawn our specified image that includes our built-in libraries.

Application

Beside managing the environment and libraries, we standardized the project structure. For instance, we extended ML-workspace so that, whenever a new project is created, a certain project skeleton is built and pulled from our Git in order to quickly kick-start the core tasks.

A project is solely controlled by a JSON configuration file, where the data scientist can set the path to the features and targets, and tweak the feature selection, preprocessing, model parameters, cross-validation, etc... Besides, the user has full control of the project and can further modify it to meet the specific project’s needs.

Once the model is trained, evaluated and compared with other experimented models, the model along with its preprocessing statistics can be pushed to AI Core through the Algorithms API and is then ready for serving customers. Alternatively, it can be exported and deployed somewhere else, like in a vehicle ECU.

Deployment

The main server is currently deployed on-premises equipped with various high-performance GPUs from Nvidia. As well, and especially for cross-research, we use AWS EC2 equipped with GPUs or AWS Sagemaker.

Conclusion

In this post, I have briefly talked about the overview of the COMPREDICT Platform. Then, I explained in details two aspects: AI Core and ML Framework. The following figure shows the current version: