If AI was an ice cream sundae, deep Learning would be the sprinkles and cherry on top. DL is the “fun stuff”, “the exciting part of AI” — and here’s why:
Artificial intelligence gives machines the ability to make decisions and think intelligently. One form of AI is neural networks, which are made up of many artificial neurons, which are basically mathematical activation functions. (Whatttttt 😮, don’t worry, I’ll explain this soon) These are functions that take in any input (x) and decide what the artificial neurons output.
How Functions Function
An example of a very simple function is F(x) = x2
(F(x) is function of x)
If we graphed it:
But… functions get super complex. Kinda like some people’s Starbucks orders complex
Some functions might look like:
Basically just super complex.
Activation functions just take in inputs (adds ’em all up) and determines the outputs based on some fancy math.
Let’s dive a little deeper
To get a better understanding of exactly how these neurons operate, let’s use my dog.
A FUNctional Function Analogy
F(x) = pun game strong
If you don’t remember her from my classification algorithm article (read that here 🤩), meet Summer.
When I feed Summer (“artificial neuron”) food(“inputs”), her digestive system (“activation function”) will decide what it will output (“output”)…
Artificial Neuron: inputs + activation function → outputs
We can also add weights and biases to this whole equation.
Don’t know what weights and biases are??
Weights are like “highways” for the inputs. They alter the mathematical impact of a certain function.
For example, Summer’s a little picky with her meals. Based on how her food (the input) is served, it’ll determine the output.
When her food is served as kibble Summer normally throws it up(it’s not up to her standards🙄). When food’s served semi-moist it is easily digested by Summer. If we feed her food in the form of raw food, the output is not pretty.
We can think of the form of food that we’re serving summer as the “weight” — we can adjust the form(weight) of the food (input) which will slightly change Summer’s digestive reaction (activation function) which will determine the output.
We can go even further to alter Summer’s food output, by adding a bias. A bias is a number which will somewhat change the activation function. It almost adds an opinion to the network
For example, we can give Summer a digestive pill, which might help her digest certain types of food. So even if the weight of the input isn’t ideal(maybe we’re feeding her kibble), the pill adds “bias” to the activation function (Summer’s stomach) so she can digest the kibble (and not throw it up).
Changing the dosage of the pill (to help change up the outcome) or changing the brand/type of pill would be considered “adjusting the bias”.
The new equation for artificial neurons:
Artificial Neuron: inputs + activation function + weights + baises→ outputs
** note: we’re actually doing fancy multiplication with all these values, but that’s an explanation for another time.
Let’s rewind so this makes more sense:
Summer, the dog, as a whole is like an artificial neuron; she is in charge of taking in inputs and coming up with an output.
She consumes food as an input. Different forms of food (dry, moist, semi-moist, raw,etc.) are like the weights: they deliver the same input (food) just in different forms/values. We can also give her a pill at different doses/brands/types which’ll add a bias to Summer’s digestive system (i.e. her activation function).
Summer represents just 1 “neuron”.
DEEP Explanation of Deep Learning Networks
(Actually not that high level I just couldn’t think of a better title)
(Note: the whole goal of AI and Machine Learning is to have the optimal weights and biases, which combine the input data as best + accurately as possible).
Deep learning is when you take a whole bunch of neurons and make layers out of them.
The job of each neuron in a layer is to combine the inputs in different ways to understand the optimal combination of inputs. We want to understand how we can combine inputs in the best way to predict the outcome.
When we get a bunch of neurons, they work together as a team to figure out how to combine the inputs.
For example, if we were trying to optimize Summer’s meal plan, it would probably take a long time to figure out what is the best weight (the form of food)and bias (type/dose of the pill) to help Summer digest her food. We can test a new combination every day on 1 Summer but this would take a very long time. Because we’d only be testing one possible combo of weights and biases per day, and there are tons of variations.
But what if we had 1,000 Summers? Every day we could test a new combination of weights and biases on 1,000 Summers, therefore 1,000 different combinations per day. We would gain the most optimal version of Summer’s meal plan way faster.
This is similar to a “neural layer”. (A bunch of activation neurons stacked on top of eachother).
When you take a bunch of neural layers and stack them on top of each other, you have a deep learning network.
Deep Learning is basically a large network of neurons that allows us to figure out the most optimal way to combine features/inputs (x) with weights and biases to come up with an outcome (y).
There are 3 steps to what deep learning does:
1.Predict y(output) based on x (input)
2.Calculate how wrong prediction was
3. Readjusts its weights and biases; aka train the network, based on how wrong it was. (Through back propagation + gradient decent)
So the DL network ( or a “neural network”) starts with random biases and weights.
We let the network run by giving it inputs and allowing it to practice predicting. This is called “feeding forward”. Data moves from inputs through hidden layers and out through the output layer.
Since all the weights and biases are random, it probably won’t come up with a very accurate prediction.
Based on the prediction, we’ll calculate “loss”: which is essentially how wrong the network was.
Let’s say you were adding 4+6, and you got 12. You‘d be 2 numerical values wrong. Your error margin would be “2”. If you added 4+6 and got 15, you’re still wrong, but you’re more wrong than before. 12 is closer to the actual sum (10) than 15, therefore you’re less wrong.
If you said 12, you’re 2 numbers off. But if you said 15, you’re 5 numbers off.
What we do in machine learning is to calculate how wrong a machine is (ie its error) and adjust accordingly.
The more wrong you are, the larger, more drastic adjustments you’ll make.
Hmmm, let’s imagine you and your buddies were playing a fun game of “hot or cold”. An object is hidden in a room, and one person has to find the object. They’re given hints to where the object is in relation to their position.
When the person is far away from the object, they are “cold”, but when they’re close to an object their location is “hot”.
When someone is “cold” they’ll normally take a big step in some random direction. Since they’re not even close to the object, they might as well try a new spot far away to see if they become closer. When the person is near the object (aka “hot”) they’re going to make smaller adjustments in hopes to get even closer than they are.
When the error is large, (we are “cold”); our adjustments are more drastic, because we’re going to need to take bigger steps to get to the correct location.
When the error is small (“hot”) our adjustments are going to be smaller because we’re already so close, and we wouldn’t want to take too big of a leap, and accidentally become “colder”.
Machines follow this exact same terminology when training.
What a coincidence! That’s our next section 😎
When we train an AI “model” we’re adjusting its parameters (weights and biases) to make the model more accurate.
We feed it training data so that the model can practice and make adjustments based on the loss function. (The larger the loss, the larger the adjustment).
Over time we feed it a TON of data so the machine can master its task.
“Practice makes Perfect” — Any math teacher ever trying to convince you to do your homework
The machine adjusts its parameters and tries to become a better version of itself at doing a specific task.
Once we’ve gone through many iterations of predict, error, train, repeat, congratulations: you’ve created a deep machine learning model!
Still a little fuzzy???
I’ll let you in on a little analogy that personally helped me understand deep learning.
Cool Deep Learning Analogy Time
Let’s pretend we’re training an algorithm to identify whether something is a face or not.
We’ll use a deep learning neural network to help with this task. The neural network is made up of 3 main layers: input, hidden and output.
The input layer is where we feed our input (x). We pass this input through the many activation functions in the hidden layer. Finally, the output layer will spit out the result.
We can think of this neural network as a company (with Employees, supervisors, and a CEO). And the task of this “company” is to correctly classify whether an image (input) is a face or not (output).
So, I knock on the company’s door and hand one of the employees a picture (this is like the “input”) and ask for them to classify if it’s a face or not. The employee says “sure give us a bit” and closes the door. The employees represent “neurons” in the hidden layer.
Hidden layer because the door is closed and me (the outsider, supplying the data) has no clue what’s going on within the company building.
But…(based on some secret sources) I can tell you what’s happening inside:
All the employees get together and start looking for different features on the image. They try to understand the input. All of the lines, shapes, and colours that make up the image.
Oh, and also all the people from the company are colour-blind and only see in grayscale.
Once each of the employees thinks they understand the image, they “activate” an output, such as seeing a thick line, noting a small circle, darker shaded areas, etc. These are “features of the input”.
All of the employees then run to all of the supervisors and give them their outputs. With the employee’s outputs, the supervisors will try to better understand the image. For example, one supervisor “neuron” might think that the circular shape is an eye. They might also see more complex features like noses and mouths, based on the outputs of the employee “neurons”.
Once the supervisors have extracted their features (such as left eye, nose, eyebrow etc), they bring their findings to the “big guy”, the CEO.
The CEO takes in all the outputs from the supervisors and has the final say if, based on the supervisor’s features, whether or not the image is of a face.
In the theoretical deep learning company, the CEO would come back to me (the person who supplied the input image) with the outcome (face or no face).
Let’s suppose the image I gave him was of my dog, and the CEO tells me his team is 70% certain it is a face. I would just look at him right In the eyes and say “Sorry Mr.CEO you’re wrong — loss function =0.7”.
O.7 because he wasn’t 100% wrong. The correct answer was 0% face, but he said 70% face.
The CEO becomes outraged. He is angry at the results of the prediction, so he’ll go back into the office building, and start yelling at the supervisors to “change their parameters” (weights and biases) so that they extract better features. Then the supervisors will yell at the employees to do the same.
This is backpropagation. It’s basically when (starting from the output layer) we adjust the parameters (weights and biases) throughout the model.
Order of adjustment:
Ceo → Supervisors → Employees
We repeat the whole “giving the company a picture” process (employees → supervisors → CEO ). Also known as feeding forward or predicting. Then we get the loss function value and adjust the model using back propagation (or if you’d prefer angry CEOs yelling at their employees).
And that’s it. The sprinkles and cherries on top of the AI ice cream sundae.