How To Continue Trusting In Machine Learning Models Post COVID-19
Torgyn Erland, Senior Data Scientist, Ioan Stanculescu — Principal Data Scientist, Carlo Giovine — Associate Partner, Maren Eckhoff — Principal Data Scientist, Didier Vila — Global Head of Data Science, QuantumBlack
As countries around the world carefully begin to relax restrictions, many hope that the intense disruption facing many organisations will begin to diminish. Yet for data science teams working on advanced analytics and machine learning (ML) solutions, this ‘next normal’ is rife with challenges.
The last few months have brought enormous disruption to how people, organisations and technology behave, and it is likely we will see this continue into the future. Demand patterns are changing as consumers around the world reduce spending, public spaces become less busy as road traffic and footfall plummet and businesses have cut budgets and consolidated their operations. How can ML stay resilient and effective when model assumptions are based on behaviours that are no longer prevalent?
This article offers an initial perspective on actions data practitioners can take to mitigate the impact of COVID-19 on their ML solutions. We first consider measures increasing operational control over models in production, and then broaden the solution scope by discussing novel data sources and methodological approaches.
Strengthen Model Governance
Disruption to typical behaviour can impact ML models. Production models face an influx of data at odds with the historical information analysed during model development. This can lead to a deterioration of model performance, and unstable or irrelevant outputs. We argue for doubling down on management solutions for models in production.
Live model management systems enable early detection of dataset shift and model accuracy degradation. Coupling these with fail-safe design to e.g. a pre-agreed rule-based logic, and quick escalation routes will help minimise negative impact on the business. In our experience, model tracking systems are designed to be highly configurable, but pre-crisis settings may require adjustments, for instance, to re-calibrate thresholds for automated alarms.
Anyone leading a team of data practitioners should recognise that technical expertise alone will sometimes not be enough to properly diagnose unstable model performance, and should source tools and techniques to enable their team to work effectively with business and domain experts. Domain expertise is particularly valuable and should be incorporated in the model governance, alongside frequent user feedback.
It is helpful to visualise model performance metrics that are meaningful and accessible to both data science and business teams. Shared dashboards help jointly investigate novel data patterns and identify the drivers behind accuracy degradation.
Model performance investigations can be further accelerated by leveraging Explainable AI (XAI) techniques. We recommend employing quick and efficient solutions such as partial dependence plots, Shapley values and local surrogate methods to validate the covariate-response relationships prior to initiating any model adjustments.
Integrate New Data Sources
Organisations can leverage new types of information to better understand changes in data patterns and help future-proof ML models. The new data can come from sources both internal or external to the organisation. In the context of COVID-19, relevant external data to consider are days of policy changes, epidemiological model forecast & future scenarios, movement patterns, and previous crisis information. With internal data sources, we recommend increasing frequency and granularity of data collection. For example, shifting from weekly to daily ingestion can help shorten learning cycles and better understand changes in data segments.
Combined with model performance tracking, fresh data sources can also support timely model correction. We suggest, for instance:
● Triggering model re-calibrations in line with leading indicators for socio-economic activities (e.g. local and regional GDP forecasts, consumer spending by category, jobless claims).
● Revisiting feature inclusion to reflect instances when causal pathways relatively weak in recent years have now gained in strength (e.g. unemployment data vs automotive retail).
● Redefining the outcome variable via suitable proxies when the concept shift is difficult to ascertain (for example, when the outcome variable will not be known for many months ahead).
Introduce New Modelling Approaches
As new data regimes emerge, previous well established methods will struggle despite any model refresh attempts. Alternative modelling approaches can act as a fail-safe mechanism, as an interim solution, or even offer an opportunity to harness new trends and patterns emerging from the COVID-19 crisis.
The spectrum of approaches ranges from simple to sophisticated, but ideally, those should be explainable by design, support integration with domain expertise and provide a realistic quantification of uncertainty. With quick and pragmatic iterations in mind, data science teams should not underestimate the merits of a well-specified linear or tree-based model.
Bayesian modelling is particularly well-suited to the COVID-19 regime due to intrinsic abilities to measure uncertainty, include prior knowledge and embed structural relationships between variables. Hierarchical linear models, Gaussian processes and other probabilistic graphical models are strong contenders to successfully model the comparatively small data sets collected during the pandemic. In close relation, Bayesian networks support causally-grounded counterfactual analysis to infer what happens to target Y when an intervention is performed on feature X.
Other models which work well under high volatility are those designed to learn from interacting with an environment. For instance, active learning strategies can be employed when labelled data needed to capture the COVID-19 regime is insufficient. Data practitioners should also consider multi-armed bandit and reinforcement learning models for applications where the main concern is connecting action and reward. More broadly, developing “test and learn” and experimental design techniques may accelerate discovering highly performant solutions.
In the wake of COVID-19, successfully navigating the disruption in the short-term paves the way for more robust ML models in the long-term. This period of intense change presents data science team leaders with a unique opportunity to establish model governance at scale, enhance collaboration between data and business teams, and lay a foundation for resilient ML systems in the future.
If you are interested in learning more about QuantumBlack please go to our website, or if interested in specific roles, please contact us at email@example.com.