ISLR: A Python Perspective — Part 1: A Refresher

5 min readAug 1, 2018

We might as well begin rightaway with the technical aspects of the book, but I have decided to shed some light upon the most essential concepts before diving straight into the code-based sections. This post would mainly discuss the driving concepts involved in the development of any Statistical Learning model. By the end of this post, one would be in possession of the essential concepts that leads to the development of a suitable ML workflow for a specific task.

These concepts merely provide the wheels to the car we are about to build over the next few days!

It’s essential to know what task you are about to undertake before you start doing it.

Statistical Learning aims to identify and estimate the systematic information that a set of predictors can provide about a quantitative response.

Here, Y is the quantitative response and X is a set of predictors.

The whole aim is to estimate the function f, which is fixed and yet unknown. Moreover, the estimation of the function f may lead to the accomplishment of the following tasks:

Prediction:

This works for situations when a set of inputs X are readily available, but the output Y needs to be obtained. We estimate a function that uses the predictors to output a prediction Y’. The accuracy of Y’ depends on the statistical learning model used to estimate the function(reducible error) and the inherent variability of the quantitative response Y arising due to epsilon(irreducible error).

Inference:

In some cases, our goal may not be to make predictions for Y but to understand the relationship between the predictors and the response, or put in simple words: understand how Y changes with X. This is often used in estimating the behavior of the output provided some of it’s predictors are changing. The nature of association can then be determined from Inference.

Prediction Accuracy and Model Interpretability

The Devil’s trade.

The estimation of the function f can be done in either a parametric fashion or in a non-parametric fashion depending upon the needs. However, of the methods analysed in the book, the models exhibit a trade-off between prediction accuracy and model interpretability.

An inflexible linear model like the Linear Regression can be highly interpretable for an end-user but will provide less accurate results when compared to another method like thin plate splines. Various models lie on different parts of the interpretability-accuracy space, as shown and one might be tempted to use a specific model based on one’s needs.

As flexibility of a method increases, it’s interpretability decreases.

One may use a less flexible model like Linear regression for simple inference tasks but highly flexible methods can model the data well and lead to excellent prediction results.

Regression and Classification

What’s the deal?

The variables that a model aims to predict can either be qualitative or quantitative. We tend to refer to problems with a quantitative response as Regression Problems and the ones with a qualitative response as Classification Problems.

Taken from http://www.slideshare.net/datascienceth/machine-learning-in-image-processing

Examples include predicting the income of a person, having known other data, which then becomes a regression problem but predicting the person’s gender becomes a classification problem.It can be thought of as drawing a wall between two distinct classes of objects leading to proper ‘classification’ or attempting to be a perfectly modeled locus that predicts the next response, leading to ‘regression’.

The Bias-Variance Trade-Off

When a statistical learning model is evaluated, the extent to which the predicted response is close to the true response is usually quantified as the error term and the most common term amongst them is the mean squared error(MSE). The Mean Squared Error can be decomposed into three fundamental qualities as follows:

The test MSE can be obtained by repeatedly estimating function f using a large number of training sets and tested each at x_0.

The overall test MSE can can be computed by averaging the LHS term over all possible values of x_0 in the test set.

Now, the Bias term arises out of a discrepancy in the model and is an inconsistency between the averaged estimate and true function. If the below graphic is to be considered, the average position of the thrown darts is actually a bit off from the desired bull’s eye.

Moreover, the Variance can be seen as the divergence of the estimate from the average value. Small changes in training data will lead to large changes in f. Highly flexible methods tend to have higher variance

Image taken from: https://elitedatascience.com/bias-variance-tradeoff

As a general rule of thumb, as we begin using more flexible methods, the variance will increase and the bias will decrease.

This post is loosely based on the second chapter of the book.

These concepts will help you build upon the other concepts described in the book in a much easier way. Also, do not forget to check the video lessons while reading the book.