Published in


A framework for evaluating ml automation

End-to-end ML platforms are the new kids on the block for many enterprises. They are early in maturity but produce many impactful results through ML automation. So how do we measure the platform’s effectiveness in driving ML automation?

I have found the framework of the level used to describe self-driving autonomy simple to understand and communicate. Let us try to follow a similar framework for ML automation. Note developers can define these levels as they see fit. The following is an example only.

Photo by Leslie Lopez Holder on Unsplash

Level framework:

Level 0: No automation. Developers are heavily involved in the babysitting model lifecycle, and subsequent retraining runs.

Level 1: Automated training/retraining supported. ML developers can configure periodic runs for their pipeline at specific events: dataset change, time (monthly, weekly, daily, etc.). This is equivalent to continuous integration.

Level 2: Automated model deployment. ML developers can trigger automated pipelines to evaluate, experiment, and deploy a trained model to serve production traffic. This is equivalent to continuous deployment.

Level 3: Developers can add automation to the post-deployment lifecycle — monitoring, traffic sampling, labeling, and dataset generation.

Level 4: At level 4, we have a generic ML automation system as a part of the platform. Developers can write custom automation for parts or the whole of the ML pipeline. There is a steady increase in ML pipeline CI/CD. Note that a generic ML automation system can cover all levels (0–4).

Now, coming back to the original question:

How do we measure the effectiveness of the platform?

  • Maturity of ML pipelines: One can start with the distribution of ML pipelines grouped by level of automation. Usually, levels zero and one will have the bulk of pipelines at the beginning. The ML platform team can take goals to shift the bulk of these pipelines to levels 1,2 & 3.
  • Model freshness: How old are your production models? ML Automation should reduce human effort considerably and improve the time to release models. This can be measured quite well.
  • Track human-induced ML production errors. This is a lagging indicator that should go over time.

Conclusion :

If you are building ML automation, start with an evaluation framework with clear metrics.



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store