Model migration tooling
To ensure model compatibility after major ml framework API changes, you need to retrain and redeploy models. You have to make sure the retrained model and its behavior haven’t changed significantly from production one. This is a difficult process when done at scale (~100–1000s of models at a time).
3 tests (or tools) to be really useful for validating model migrations :
📌 [Offline] Prediction comparison : The process for comparison is straightforward; it involves computing a set of metrics (data distribution metrics, quality metrics for your task) using predictions from models A (production) vs B (newly migrated) on a same dataset (can be a dummy dataset). Investigate further if any of the metrics show any significant difference.
📌 [Offline] Perf measurement : Compare the old and new model on model weight (number of parameters, storage required to load the model), latency on target hardware, flops. The old and new model should have the perf metrics in similar ballpark or as expected in the migration.
📌 [Online] Model canary : Before setting up an A/B experiment on the new model or deploying it [generally expensive], run a canary on the model with shadow traffic. Look out for the following — can the model be loaded correctly on model serving infrastructure, are there exceptions while executing model on shadow traffic, is logging & monitoring working as expected ?
If you fail in any of these for the first set of models in migration, you have identified bugs / unexpected behavior from your migration. Investigate ! #failfast