Neural Networks in 5 Minutes

James Parkin
The Startup
Published in
5 min readJan 27, 2021

Neural nets have exploded in popularity in recent years. Their use range’s from self driving cars to diagnosing skin cancers. I’m going to try and explain what they are in 5 minutes. To achieve this we need to break them down into their components and tackle these smaller chunks. I’m a medical doctor, so this will be light on the maths. Phew…

Before I start the breakdown, it’s important to know the general use case for this kind of machine learning architecture. Primarily, these kind of AI solutions input image data then process, segment and classify to output useful information to us humans. There are 2 distinct phases: training and testing.

In my opinion, these are the fundamental elements to understanding what’s going on in the world of neural networks.

Nodes

Weights/connections

Layers

Activation functions

Cost functions

Forwards and backwards propagation

Let’s start with an analogy. Imagine the process of training a neural network to be like driving from your home to the shops and back… Humour me… On the way to the shops there are 2 sets of traffic lights. In this scenario, nodes are any stationary points on your journey. Your home, both sets of lights and the shops are nodes.

A picture of a car to keep you interested…

In order to get from one node to another (i.e. your home to the first set of lights) you must drive on the road. We can say that the road “connects” us to each stationary point in the journey (or node). Now let’s say each road has it’s own speed limit, what do you think the speed limit represents… The speed limit is the weight of the connection.

Life doesn’t occur in a bubble. Whilst you’re driving to the shops many other’s may also be driving to that same shop. To keep the analogy alive, let’s say 2 further people drive from their homes through 2 sets of lights and then reach the shops. The roads and traffic lights they drive through may be the same, similar or completely different to the ones you encounter.

At each stationary moment within this network of people driving to the shops, we have a set of nodes. We call these sets of nodes layers. Everyone’s home (or starting node) represents the input layer. It’s where we start our journey. The traffic lights represent hidden layers and the shops represent the final (output) layer.

Okay, so if you’ve managed to stay with me so far then we have our simple neural net architecture. When all 3individuals get in their cars and drive to the shops through their respective traffic lights we have completed forward propagation. Forward propagation in the world of AI is when data (be it image data or another type) passes through the network and reaches the shops… Sorry, reaches the output.

This is where the analogy starts to break down. At every traffic light (node within the hidden layer) the car will either pass through to the next set of traffic lights or it will be stopped. How do we decide which cars get to pass through each node? It’s the nodes activation function of course… For all intents and purposes, this is a mathematical probability distribution based on the kind of car and speed (weight) it’s travelling at when it reaches the lights. Now our traffic lights are more like gates, letting some cars pass depending on their brand and speed (weight). And, in a true neural network, cars, with humans inside, can travel down multiple roads and be at multiple traffic lights at the same time… We now have a multiverse…

Neural Networks – an Intuition
Can you label the diagram?

The deterioration into quantum physics can be ignored for the sake of comprehension. We can think of the final layer (output) in our story to be the collective state of all the cars arriving at the shops. As you can imagine this depends heavily on the speeds of each car and whether or not they pass their corresponding activation functions.

When we train a network like this to become optimised at a task, we have outcomes or labels that we expect for each set of cars that set off together. We call these labels ground truth values. Once forward propagation has occurred and produced an estimate, we compare this estimate to the ground truth. The comparison is assessed by our penultimate component, the cost function.

The cost function is a fancy mathematical way of determining how different our estimate is to the ground truth. There are many different cost functions, but an admittedly simplistic explanation will suffice. Using our cost function, and some lovely differentiation, we drive back from the shops towards home, adjusting the speed limits as we go… Okay, no more car analogies. When we move back through the network we backpropagate and optimise our weights to minimise the cost function.

My favourite cost function :)

And that’s it! We repeat this process for as many bits of data (training cases) as we possess, making small optimisations to our malleable parameters (weights) as we go. Over time our network becomes able to predict the output of a new input data (i.e. our test set) with great accuracy. Neural networks recognise patterns in data that humans simply cannot comprehend. What’s even more strange is that due to the nature of their black box structure, we often aren’t able to determine how they make such accurate predictions.

Neural networks are achieving incredible feats in many disciplines and their application to problems in healthcare truly excites me. If you’ve learnt anything or enjoyed this article, consider subscribing to receive weekly articles.

If you’re interested in finding out more about AI in healthcare read my article talking about some of my favourite examples here. Also, check out the Youtube version if you’re done reading for the day.

--

--

James Parkin
The Startup

Medical Doctor and Data Scientist living in London. I write about novel Machine Learning techniques being used to solve Healthcare’s biggest problems.