Who’s afraid of Machine Learning? Part 2 : Creating a Machine That Can Learn
Intro to ML (for mobile developers)
Last post gave a general overview of ML and why is our brain provides inspiration for it. (bit.ly/brittML-P1) But how do we make a machine do a similar process our brain does?
This post will give a simplified explanation on how to create a model that enables computers to take data, make conclusions out of it, and improve the conclusion making ability (or: learning).
Hands down, my favorite food in the world is strawberries 🍓! I can eat them every day, all day, only them, and it would make me so happy! 😁
I want to write a program that will take an image and tell me if there’s a strawberry in it or something else. To simplify:
Is this a strawberry or not a strawberry?
If you were to teach a baby, who just came into the world, that an image is of a strawberry. How would you do it?
Showing the baby an image and saying- “this is a strawberry- learn!” would not be so helpful.
You will probably say something like: “see this image? In the image, there’s a strawberry. Do you see how red it is? And its shape is round on the top and pointy at the bottom.. And it has some small narrow leaves at its top.. And it has a unique pattern of seeds all over it.. that’s how you can tell it’s a strawberry.”
The baby would look at the image, would notice the features you just described, and then drive a conclusion: “because of those features, those characteristics, this is a strawberry.”
When the next image would appear, the baby will already know to look for the characteristics, spot them or their absence, and then make a brave conclusion: “this is a strawberry!” or “it’s not.”
Truth is, even if you didn’t break down the image to features, the baby’s brain will. In fact, either way, it will find more characteristics that you haven’t explicitly mentioned. This is how our brain works and some of the magic it just knows how to do. 🔮
Let’s dive a little deeper into that inner conclusion making process, and suggest how we can get a machine to do something similar:
Creating an Artificial Conclusion Making Model
The previous post, mentioned that our brain works with a neural network. To create a similar model for a machine, we should create an Artificial Neural Network (ANN):
The input
An image, as stated before, is too complex to be used as input just like that. We would want to find its features, the same as we did for the baby. So our input will be, not an image, but a list of features.
For simplicity, let’s focus on 3 features only: the red color, the seeds pattern, and the top leaves. For each of those features, let’s dedicate an artificial neuron to represent it.
A neuron, as mentioned, is just a bit of data. Each neuron will get a number, a score, to hold. For instance: how much is the image red? How much does it have a seedy pattern, how much are the top leaves are a strawberry’s top leaves?
** The numbers in the picture are just some scores I made up for the sake of the example. Don’t dwell into what do they actually mean. The concept is the important thing here: our input is a list of artificial neurons that represent features extracted from an image.
We can call this list of neurons — input layer.
Computing a middle layer
Next, we’ll create another layer of neurons. Each of them will hold another score.
Let’s see how we can give each of the neurons a score, by thinking of one of the first:
We know the features of the image (red, pattern, top leaves), but we don’t really know how important is each feature for the final conclusion. Meaning, how much is the fact the object in the image is red, important to decide if it’s a strawberry or not? How much do the top leaves matter? Well, there are images of strawberries without top leaves. A little less common, though, is to have a strawberry image with an object that is not red..
In our brain, the neurons are connected in physical connections. They can be physically stronger or weaker, and to pulse (“fire”) more or less intensely when they pass the data between one another.
On the ANN, let’s represent the intensity or the “importance” of each neuron (or feature) with a number that represents a weight. I’ll write them on the edges, with the blue background.
In the beginning, the baby can’t be sure how much does each feature important. At first, we’re just going to guess. So I’m making these numbers up, yes, but bear with me.
For each neuron on that middle layer, we will create some computation, with the features and the weights. To demonstrate, let’s take a simple linear equation.
Later, we can decide to tweak the score a little with some bias. Why? For now, just because it makes sense for me... which bias to choose? For now, I’m just making up a number again (0.7). But bear with me 🙂
We get a final-final score and give it to the neuron to hold.
We do the same thing for each of the other neurons of the middle layer. The weights will be different, the bias may be different, and so will different scores.
The output
Ok, that’s nice, but where is the output? The output for us, humans, should be whether the image is of strawberry or not.
As we already know, our ANN only knows how to work with neurons and weights, with bits of data and numbers. It doesn’t understand complex human understandable objects, like images or labels. Let’s create another layer of neurons to represent the output. A neuron per label.
We’ll do basically the same thing as we just did to compute the middle layer, but now to compute the output layer:
- create weights (at first we make them up)
- conduct some computation
- get a result
- keep the result in the neuron
The only thing is that now, the final result is not just a number, but it represents the probability that this image is indeed a strawberry, or a different object- according to the label the neuron represents.
On this example the neuron that represents a strawberry has an 87% probability, so we can say that we’re 87% sure that this is a strawberry!
Applying the model
Let’s try to apply the same ANN on another image, for instance: an apple.
The weights and the equation we use are the same for any image we’ll apply the model on. However, since the input is different the computation, the middle layer scores and the output layer will be different.
On this example, our output layer demonstrates 80% on the “not a strawberry” neuron.
Great! we have a model! This is the very basic concept of how can machine learning models be built. A little more accurately: this is a simplified variation on a deep learning model, which is one drop in the machine learning models ocean.
You might say: “wait, but you just told us you made up a bunch of numbers! How could this give us something that we can trust?”
To better understand the “learning” process this model does, see you on the next post. bit.ly/brittML-3 ✨
Thank you for reading! ❤👏🍓