Population Stability Index (PSI)

Aditya Agarwal
model monitoring
Published in
Nov 10, 2020

Often model is trained offline and set to use in production for inferencing. In such scenario, it is required to know when to retrain the model. One of the indication of model needs retraining is change in distribution of input attributes. We call it “Data Drift” as well.

Population Stability Index (PSI) compares the distribution of predicted probability in scoring data with predicted probability in training data. The idea is to check “How different the current scored data is, compared to the training data”.

Formula to calculate PSI

Steps to calculate PSI

  1. Sort score range (derived from predicted probabilities, probabilities can also be used directly) in training data in descending order
  2. Split the data into 10 or 20 groups (deciling)
  3. Calculate % of records in each group based on training data
  4. Use same score range as used in training data, calculate % of records in each group based on scoring data
  5. Calculate difference between Step 4 and Step 3
  6. Take Natural Log of (Step4 / Step3)
  7. Multiply Step5 and Step6

A generic rule to decide on model retraining based on PSI —

  1. PSI < 0.1 — No change. You can continue using existing model.
  2. PSI >=0.1 but less than 0.2 — Slight change is required.
  3. PSI >=0.2 — Significant change is required. Ideally, you should not use this model anymore, retraining is required.

--

--