Why generalization is important in machine learning?

Aniket Kokate
Sep 9, 2018 · 4 min read

Before directly jumping to the point. Let me ask you

What is learning?

The definition that I found on internet was

The knowledge/skills we get through study or experience and with that experience we get expertise.

Rather than a typical textbook definition I will explain it using an example of our ancestors, the Homo erectus (no pun intended :p).

When for the first time Homo erectus saw the fire, he thought it was a creature and tried to capture it, so he tried to grab it with his hand and (its not a surprise) burnt his hand. After repeating this process multiple times he came to know that touching this creature is causing him intense pain. He memorized this process and came to a conclusion that he shouldn’t touch it.

So if we consider the above figure we can say

Experience — He got his hand burned after touching the fire.

Expertise — Don’t touch it.

In machine learning, we take this process of converting experience into expertise with the help of a program.

From above examples you guys must be thinking

Memorization is the best form of learning

well…..that’s not entirely true.

Let me explain it using a study of a rat’s behavior.

It is known that it difficult to poison rats

Why?

When you put a poisonous food (which is called bait) near the rat

not very good with Photoshop 😐

And if he notices that the food (which he is not used to) seems different in terms of appearance or smell than the food he usually eats. What he will do is take a small bite from his new (suspicious) food and wait for few hours.

And after few hours if he feels sick, he will associate this sickness with this new food and will never touch it again.

This experiment is called Bait Shyness.

At this point you guys must be like

By the end of this post you will understand why this was important.

Suppose we are writing a program on email spam filters. We have definite set of multiple emails and each email has its contents (text) and a type (labeled spam or not spam).

In real world you don’t get these labels attached to your emails. So you have to write a program that learns from these finite set of emails to differentiates spams from not spams.

How will you develop a program like that?

There is a very simple way of doing this.

Let’s say we write a program that memorizes all the finite set of emails in the same way the Homo erectus memorized his experience with fire.

This method will work correctly for the emails that we have already seen (memorized) but it will fail if we show an email to our program which he has not seen before.

So if we want to write a program which classifies spams from a bunch of random emails, we want to do more than just memorization. We need to understand what makes this email categorized as spam. Our program needs to find and learn the patterns like

Are there any common words in the body of spammed emails eg. “Deposit $$$ to proceed further”

Does the subject of email looks suspicious

The process of learning these kind of patterns is called Generalization.

If we write a generalized program, then it will be able to predict whether a given email(which the program has never seen before) is spam or not.

Machine learning is all about finding patterns in a given data and making predictions based on that data. The trickiest part is finding that pattern. This is where generalization comes into picture.

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade