Transparent ML for Enterprise Decisions — Linear Models

4 min readJul 28, 2022

Note: this is Part II in a series of articles on transparent machine learning models, click here for Part I — Introduction , Part III — Rule Sets and PartIV — ScoreCards.

Photo by Ronaldo de Oliveira on Unsplash

Intro to Linear Models

Linear regression analysis has been used since 1805 when Legendre and Gausse invented the “least squares method”. A linear model makes a prediction y as a linear combination of input variables x:

Let’s say you want to predict the price of a house. As input variables we might have:

number_of_rooms,
livable_area,
garden_size,
years_since_renovation,
has_pool (boolean)
has_security (boolean).

A linear model might look something like:

The strong points of linear models

Linear models have some nice properties that makes them interpretable:

Linear effect: as a variable increases in value, so does the prediction (unless the coefficient c is negative, in which case the inverse is true).
Example: an increase in livable_area will always increase the predicted price of the house, whereas an increase in years_since_renovation will always decrease the predicted price.
Additive: the contribution of each variable is independent of the others.
Example: If number_of_rooms increase, the price increase, regardless of other features.

The weaknesses of linear models

The properties that make linear models easy to interpret also means they won’t work for all types predictions, because there are obvious limitations in what can be captured in additive, linear form.

Interacting variables: A house with both a big garden and a pool might be more valuable than indicated by separately adding up the values of the garden and the pool. In cases like these, additive models that weigh each variable separately will fail to capture these interactions.
Non-linear effects: Going from a house with 2 rooms to 3 will typically have a bigger increase in house value in than going from a house with 4 rooms to 5. This effect of “diminishing returns” cannot be capture in a linear form with a single coefficient in front of number_of_rooms.

Improving the accuracy of linear models

While linear models have significant limitations, don’t discard them just yet! Modest amounts of variable interactions and non-linearities can be handled with a few tricks. For example:

For the garden and pool interaction, we can introduce a new variable called has_garden_and_pool, which is true if garden is above a certain size and has_pool=true. Training the linear model will produce a suitably large coefficient for has_garden_and_pool, effectively capturing the the added value of this combination.
Similarly, to capture how larger houses increase less by each added room, a large_house variable can be added. Again, linear regression will leverage this variable by producing a (probably negative) coefficient to offset the diminishing effects of number_of_rooms.

ML Lingo (optional): the tricks presented here are special cases of more generic ML techniques, namely feature engineering and piecewise functions/regression splines.

While it might seem complicated to create new features, it’s worth noticing that in many industry use cases, various metrics has evolved over time to capture non-linearities and variable interactions. Often you will find already established terms and metrics that can be leveraged for more accurate ML models. Here are a few examples:

Debt/Income ratio (DTI) for mortgage applications
Claims in last 12 months for insurance fraud detections
Usage Level (high, medium, low) for churn prediction (computed from monthly usage)

Visualizing linear models

Conceptually, linear models are represented as “hyper planes” in n-dimensional space, i.e one dimension per variable. A simplified example for three dimensions is shown below. By selecting a number of rooms (x-axis) and garden size (z-axis), we can estimate the price of the house (y-axis).

A simplified linear model predicting the price of a house based only on garden size and number of rooms.

Linear models for classification

Linear models aren’t restricted to predicting numerical values (regression), as they can also be used for classification tasks. For example, let’s say we want to predict the likelyhood that a customer might churn in the next year, i.e. choose not to renew their subscription for one of our products. Again, in a simplified model, using only Income and Monthly Usage as input variables, we can predict probability of churn as a number between 0 and 1. Typically, we then interpret this through one or more “threshold rules”, taking different actions depending on the risk of churn we’re facing with this particular customer.

A simplified linear model predicting the likely hood of churn for a customer, based only on Income and Monthly Usage. In this case, 0.5 is considered the threshold for a positive churn classification.

Summary

As a conclusion, while linear models aren’t exactly novel and have some inherent limitations when it comes to expressing non-linearities and interactions between variables, they can still be useful for certain tasks; both for predicting a numerical value (regression), or for classification (answering yes/no).

For the other articles in this series, see:

Greger works for IBM and is based in France. The above article is personal and does not necessarily represent IBM’s positions, strategies or opinions.

Transparent ML for Enterprise Decisions — Linear Models

Written by Greger Ottosson