FinML — Testing ML Models

Published in

Tide Engineering Team

3 min readApr 13, 2021

In this episode of FinML, we talked with Laszlo Sragner, founder of Hypergolic and Neal Nathia, Associate Director for Machine Learning at Monzo about testing strategies for Machine Learning.

Key takeaways from the talk:

While the tests that are useful for classical software engineering, such as unit tests, regression or integration tests are also necessary for ML applications, but there are further parts of ML systems that need dedicated testing.
One aspect that needs to be tested explicitly is the data that goes into the system. Checking sensible ranges, missing values, and stability over time is necessary in order to train a successful ML model. These tests can be automated and can be re-used for monitoring purposes.
Another aspect is the statistical performance of the model. This part aims to ensure that the model performs well on previously unseen data.
Typical performance metrics, such as accuracy or f1 score, consider all parts of the input data as equally important and hence ‘hides’ issues that only occur on a subset of the data. Examples of such an issue might be that a model performs well on the whole but does not for certain geographies or that certain prompts that are made to a chatbot are relevant from a business perspective but are diluted in the performance metrics on the data. Methods to resolve this include performance testing on subsets and building a command line interface that allows a wide variety of people to test and report issues with your system.
When testing and monitoring a machine learning system it’s important to consider the whole system, that is including data pipelines and integrations. Problems often look like they might be related to the model, but are actually related to the surrounding system.
There seemed to be disagreement on the question of business rules that enhance an ML system. Should they be implemented together with the ML model or as a separate system directly committed to production?
Bias, fairness, and other ethical concerns should primarily be addressed before a project begins rather than as a test after the project has been completed.
The degree to which machine learning solutions should be tested depends to a large degree on the business impact with riskier applications needing more testing than less risky ones.

Further reading

Effective testing for machine learning systems.

Working as a core maintainer for PyTorch Lightning, I've grown a strong appreciation for the value of tests in software…

www.jeremyjordan.me

https://martinfowler.com/articles/cd4ml.html

About FinML

FinML is a meetup group dedicated to applications of Machine Learning in finance. We are a group that is dedicated to discuss the economic and statistical concepts behind running Machine Learning in the real world. We strongly believe in discourse, which is why our sessions are 30 min presentation and 30 min open discussion. Sign up here to be invited to all of our meetups and contact myself in case you are interested in speaking at an event!

FinML — Testing ML Models

Effective testing for machine learning systems.

Working as a core maintainer for PyTorch Lightning, I've grown a strong appreciation for the value of tests in software…

Written by Hendrik