The Elements of Statistical Learning 1

Introduction

Published in

Nerd For Tech

4 min readJun 29, 2021

I promise this will be the only boring picture

Recently I have been thinking about a new project to work on besides random burst of data science projects or concepts — of course data science related. I came across a very difficult yet necessary piece of work that I myself need to read: The Elements of Statistical Learning. After all the headaches, I wish someone could have broken it down to shorter summaries in layman terms with a fun and interactive demeaner. And then, it dawn on me. That person could be me.

This book is basically about supervised learning. Unsupervised learning is not the main focus, hence it will take a back seat until chapter 14. Since I assume normal people who is not interest in the field will not follow my blog, I also assume you understand supervised and unsupervised learning on a surface level. Therefore, I will also skip first half of the introduction, which gives examples of supervised and unsupervised learning problems, and jump to the more important contents: how the book is structured, and how I will cover these topics.

Structure

The book is structured like a chain that connects all the ideas together. It begins with an overview of the supervised learning problem in chapter 2 to bring readers up to speed, and followed by linear methods in chapter 3–4. The book believes a firm grasp of simple methods, such as linear methods, yield to the understandings of more complex ones in chapter 5–6. Of course complex methods are nothing more than building blocks for high-dimensional learning techniques in later chapters. From here the book takes a break from looking at methods individually and focuses on methods as a whole, and returns to individual methods shortly after.

Chapter 7 and 8 focuses on assessment and model inferences respectively, and follow by related procedures in chapter 10, particularly boosting. The best analogy I can come up with to contrast chapter 2–6 versus 7, 8, 10 is human body: chapter 2–6 are the internal organs and 7, 8, 10 are diets and performances. If you don’t understand terms like assessment, model inferences, and related procedure, then you are at the right place, otherwise you don’t need to read this. With chains 7, 8, 10 completed, the book brings our focus back to methods individually, but different kinds of methods.

Chapter 9–13 focuses on structure methods. And yes, I know my analogy broke down but for a good reason: chapter 10 can be categorized into individual methods or methods as a whole. A further divide of structure methods will be regression, classification, and unsupervised learning: 9–11 focus on regression, 12-13 focus on classification, and our unpopular cousin 14 focuses on unsupervised learning. Towards the end, the book shifts gear to more recent techniques such as random forest and ensemble learning in chapter 15–16, graphical models in chapter 17, and finally higher dimension problems in chapter 18. Ok we are done with how the book wants us to read it, but how am I going to blog it is a different story.

How I will cover these topics

I know the later chapters sound scary, especially chapter 18, but the complicated ideas are just a macro view of chains of simpler ideas from the earlier chapters. For example higher dimensional problems are solved by nothing more than introducing new terms (or feature) to differentiate the undifferentiable instances. What do I mean by that? Believe it or not, we are all familiar and have used the same technique to complete the square in our formative years.

Drawing a line between the instance blue ‘+’ and red ‘o’ is impossible on the left side of the graph. But if we introduce a new feature, depth, 2 groups can be separated by a line

Wow, completing the square, what a distant term, but don’t worry I will refresh and explain when chapter 18 comes. For now, let me explain how I will structure this project.

My goal is to post a new blog, summarizing a section of a chapter every 2 weeks. Every chapter has about 9–14 sections (hopefully I'll be done before your great grandkids celebrate their 50s birthday). I have never undertaken something this scale, so it’s a trial and error process for me. That means I might shorten or extent the posting periods depending on how complex the subject matter is. Since I want to keep it short, difficult materials might get broken down to many parts while redundant sections will be condensed. The frequency of the summary, therefore, will be based on complexity rather than strictly a section every 2 weeks. In sum, complex sections will get divided, and easy sections will get combined.

Thank you for following me so far and reward yourself a little for taking the first step of a thousand miles with me. See you on the next section.

The Elements of Statistical Learning 1

Introduction

Written by 袁晗 | Luo, Yuan Han