Why generalization is important in machine learning?

Before directly jumping to the point. Let me ask you
What is learning?
The definition that I found on internet was
The knowledge/skills we get through study or experience and with that experience we get expertise.

Rather than a typical textbook definition I will explain it using an example of our ancestors, the Homo erectus (no pun intended :p).
When for the first time Homo erectus saw the fire, he thought it was a creature and tried to capture it, so he tried to grab it with his hand and (its not a surprise) burnt his hand. After repeating this process multiple times he came to know that touching this creature is causing him intense pain. He memorized this process and came to a conclusion that he shouldn’t touch it.
So if we consider the above figure we can say
Experience — He got his hand burned after touching the fire.
Expertise — Don’t touch it.
In machine learning, we take this process of converting experience into expertise with the help of a program.

From above examples you guys must be thinking
Memorization is the best form of learning
well…..that’s not entirely true.
Let me explain it using a study of a rat’s behavior.
It is known that it difficult to poison rats
Why?
When you put a poisonous food (which is called bait) near the rat

And if he notices that the food (which he is not used to) seems different in terms of appearance or smell than the food he usually eats. What he will do is take a small bite from his new (suspicious) food and wait for few hours.

And after few hours if he feels sick, he will associate this sickness with this new food and will never touch it again.
This experiment is called Bait Shyness.
At this point you guys must be like

By the end of this post you will understand why this was important.
Suppose we are writing a program on email spam filters. We have definite set of multiple emails and each email has its contents (text) and a type (labeled spam or not spam).

In real world you don’t get these labels attached to your emails. So you have to write a program that learns from these finite set of emails to differentiates spams from not spams.
How will you develop a program like that?
There is a very simple way of doing this.
Let’s say we write a program that memorizes all the finite set of emails in the same way the Homo erectus memorized his experience with fire.
This method will work correctly for the emails that we have already seen (memorized) but it will fail if we show an email to our program which he has not seen before.
So if we want to write a program which classifies spams from a bunch of random emails, we want to do more than just memorization. We need to understand what makes this email categorized as spam. Our program needs to find and learn the patterns like
Are there any common words in the body of spammed emails eg. “Deposit $$$ to proceed further”
Does the subject of email looks suspicious
The process of learning these kind of patterns is called Generalization.
If we write a generalized program, then it will be able to predict whether a given email(which the program has never seen before) is spam or not.
Machine learning is all about finding patterns in a given data and making predictions based on that data. The trickiest part is finding that pattern. This is where generalization comes into picture.