Kolmogorov-Smirnov Diagnostics.

Analyttica Datalab
2 min readJan 28, 2019

--

In predictive modeling, it is very important to check whether the model is able to distinguish between events and non-events. There is a performance statistics called “Kolmogorov-Smirnov” (KS) statistics which measures the discriminatory power of a model.

It looks at the maximum difference between the distribution of cumulative events and cumulative non-events.

It is a very popular metric used in credit risk and response modeling.

Input:

To run Kolmogorov Smirnov Diagnostics, select the binary target variable (coded as zero and one) and the predictor variables (numeric only) and select the functions using the following path:

Machine Learning à Regression Analysis (Non-linear) à Kolmogorov-Smirnov Diagnostics

Application & Interpretation

Using the logistic model, each record is scored with a probability of event. The complete sample is then divided into 10 or 20 groups in decreasing order of probability. The cumulative % of events and non-events is calculated for each decile or demi-decile and KS for each decile or demi-decile is the difference between the two. KS for the overall population is calculated as below:

KS statistic = max over (i = 1 to n) {Cumulative % of responder in groups (1 to i) — Cumulative % of non-responder in groups(1 to i) }

Higher the KS, better is the model.

The below example illustrates how we calculate KS for a logistic model:

See Also

Log Odds Ratio, Odds Ratio, Logistic Regression, Hosmer-Lemeshow Goodness of Fit.

--

--

Analyttica Datalab

Analyttica Datalab (www.analyttica.com) is a contextual Data Science (DS) & Machine Learning (ML) Platform Company.