Machine Learning Models Monitoring Architectures — Part 0

Omitting part about importance of keeping track of Machine Learning (ML) models performance in production environment, which is a huge separate discussion, we will go straight to the 2–3 minutes creative process of architecting monitoring systems for our ML-models.

Even though ML-model in production looks like a regular service, model monitoring in our case will differ from a classical notion of monitoring (like requests counts, latency etc. on devops monitoring tools). We’ll treat model monitoring as tracking requests/response to/from models and applying various methods of anomalies and concepts drifts detection to them.

In this article we will take a look into 3 different system architectures for monitoring machine learning models. Let’s call them ETL-architecture (Extract-Transform-Load), Stream-architecture and Sidecar-architecture.

ETL/Batch Architecture

Exhibit 1 — ETL/Batch architecture

Classics as it is. All requests and responses are logged to slow and cheap storage. Later, on schedule basis, ETL job is started to calculate metrics for particular period. Results of job are stored in faster storage for later use, in other alerting services or dashboard.

Advantages of that architecture are:

  • Full logging of requests/responses, which can be used later for additional data discovery
  • Relative simplicity and transparency of that scheme

Shortcomings are:

  • Full logging of requests/responses leads to additional storage demands.
  • There is delay between anomaly happened and observed.

Stream Architecture

Exhibit 2 — Stream Architecture

Looking on naming you may figure out that basis of this architecture is a stream. Requests and responses are mirrored into stream. In a downstream data is going through metrics calculation services. Metrics are returned back into the stream. As a sink for our stream some fast storage or event service can be used.

Advantages of Stream Architecture:

  • real-time monitoring
  • reduced costs on storage
  • timely fashion alerts

Shortcomings (might be argued from case to case):

  • limited history,
  • calculations scope limited by memory size
  • real-time detection algorithms are of limited applicability .

Sidecar Architecture

Exhibit 3 — Sidecar architecture

This architecture is based on usage of a sidecar design pattern. It this case model will be deployed in a container near to one with metrics calculation services and traffic router. All of this services (including model) will be encapsulated into a singular entity (a group). So, request/response traffic goes through a router, where it will be mirrored and sent to metrics calculation services. Later those metrics will be sent to central storage for alerting or displaying on a dashboard.

Advantages of Sidecar Architecture are:

  • Lower network bandwidth requirements as metrics calculation services are physically located near to deployed models. In addition to that, we can send to storage metrics aggregated by time intervals — that reduces amount of traffic through external network.
  • Easy horizontal scalability by services groups.


  • Services grouping lead to size increase of deployments.
  • Window operations are complicated.

Finally, all the 3 of architectures have their own pros and cons, but no one stricts us to use a single one. We can mix, match and tune these patterns to solve our ML-monitoring tasks.

This post will be followed by articles discussing implementation of each particular architecture using available open-source libraries and frameworks. Stay tuned.