Kolmogorov-Smirnov Diagnostics.

In predictive modeling, it is very important to check whether the model is able to distinguish between events and non-events. There is a performance statistics called “Kolmogorov-Smirnov” (KS) statistics which measures the discriminatory power of a model.

It looks at the maximum difference between the distribution of cumulative events and cumulative non-events.

It is a very popular metric used in credit risk and response modeling.


To run Kolmogorov Smirnov Diagnostics, select the binary target variable (coded as zero and one) and the predictor variables (numeric only) and select the functions using the following path:

Machine Learning à Regression Analysis (Non-linear) à Kolmogorov-Smirnov Diagnostics

Application & Interpretation

Using the logistic model, each record is scored with a probability of event. The complete sample is then divided into 10 or 20 groups in decreasing order of probability. The cumulative % of events and non-events is calculated for each decile or demi-decile and KS for each decile or demi-decile is the difference between the two. KS for the overall population is calculated as below:

KS statistic = max over (i = 1 to n) {Cumulative % of responder in groups (1 to i) — Cumulative % of non-responder in groups(1 to i) }

Higher the KS, better is the model.

The below example illustrates how we calculate KS for a logistic model:

See Also

Log Odds Ratio, Odds Ratio, Logistic Regression, Hosmer-Lemeshow Goodness of Fit.