Continuous monitoring for data projects

What does it mean for data scientists to be doing monitoring?

Operational metrics and monitoring are the pieces of software engineering that are required to make any software project successful. However, in the data world, there are different aspects we need to consider in terms of monitoring AI/ML applications, model or even datasets.

  • Agents: Adding special general-purpose code to the application environment designed to automatically capture standard metrics.
  • Spying: Using network taps to monitor calls or data flow between systems.


Data integrity — Business and business operations data is dynamic and its composition is constantly changing. This can have a performance impact on ML Models, especially with automated data pipelines. Data inconsistencies and irregularities can go unobserved in deployed ML applications. Performance degradation on applications or pipelines should be tracked, traced and fixed from time to time to keep precision and recall for them up to the mark.


Concept drift — Concept drift arises when the model’s interpretation of the data or the model changes over time even while the data may not have. The model used to predict that a certain data point as belonging to class A in the past, now it claims that it should belong to class B, as our understanding of the properties of A and B have changed since. This is pure concept drift. For tracing concept drift the application should be equipped to keep comparing historical logs and aggregated metrics to current logs and current aggregated metrics and try to see the variance in predictions.


It should be evident from the points above that monitoring needs to continuous for all ML applications, models and datasets. In my opinion a team should build monitoring upfront ahead of time and avoid doing it in haste when things turn bad as haste makes waste. Best approach is to do it right at the beginning with the proof of concept itself and keep adding and building it up as the project matures.

