How do machines really learn?

--

It’s just like how any of us learns.

There are 3 ways in which a machine learns:

1. Show the machine some examples and it will learn to generalize based on the properties. For example, if you need to teach the machine to differentiate between cats and dogs, you gather a collection of images of dogs and cats and then show it to the machine. Internally, the features like size, shape, color, etc. are extracted from the images and used to learn the differentiation. This is also called Supervised learning.
2. Give it some examples without labels and ask it to group them based on its features. For example, if we get a set of t-shirts in different colors and give it to a kid and ask him to group them, how would he go about it? He might create different groups based on the colors only or size only (S, M, L, XL, etc.) or price or a combination of size/ color/price. Different kinds of grouping can give us different insights into the set of t-shirts. Like, the red t-shirts of size M generally cost higher than other colors. This is also called Unsupervised learning.
3. How do you learn to ride a bicycle? Do you see Youtube videos of other people riding bicycles (point 1)? No. You learn it by trial and error. Given enough trials, you figure out a way to keep yourself balanced. This is what the machine does too! It goes into an unknown territory and learns to perform an action (play a game, drive a car, etc.) through trial and error. Every good action is rewarded and a bad action is penalized. Gradually, the machine learns to take good actions and that’s when we say that the machine has learned by itself. This is also known as Reinforcement learning.

So, how does a machine really learn? Well, it needs incentives. The incentives, in this case, are the minimization of loss (loss function) or maximization of reward. Think about how we learn as a student. We learn by reading/writing/observing in the classroom and then take tests. The teacher grades the tests and gives us marks (rewards). Our objective is to minimize the errors in the tests (or in other words, maximize the marks). We know we have learned the topics properly when we get the maximum marks in the tests. Machines do it by a process called gradient descent where it intelligently takes the tests and fixes its errors until it minimizes the error on the tests (or maximizes the marks).

So that’s it. The process to teach a machine is simple:

1. Get data (or define a process to get data on the go)
2. Define an incentive (or loss function)
3. Take a step and calculate the error.
4. Learn from the mistake and adjust the next steps based on the previous error.
5. Repeat 3 & 4 until the error is minimum.

So, who learned something new today? :)