# A Primer on Artificial Neural Networks

**Understanding the most transformative technology of the 21st century**

Earlier this year I attended the UK AI Summit in London as part of a delegation of Canadian AI startups. My company, Paladin AI, pitched at the StartUp Elevate event. In the few minutes allotted, I talked about how artificial intelligence will transform the aviation industry. In the near term, it will alleviate the pilot shortage by optimizing pilot training. Ultimately, we may see autonomous aircraft.

One thing was very clear about the conference: AI is big business. But what do companies and technology leaders mean when they say AI?

Let’s take a step back from the hype for a moment. Artificial Intelligence is broad and can mean different things to different people. However, over the last decade, the enormous progress made using deep learning has overshadowed most other work in AI. To understand deep learning, we first need to talk about neural networks.

## Inspired by biology

The key inspiration behind “artificial neural networks” are biological neural networks.

A neuron is a type of specialized cell for processing and transmitting electrochemical signals. It has a central nucleus and is connected to other neurons in a web, transmitting and receiving signals via its axon and dendrites.

Collectively these neurons form networks with specialized functions. The human brain has many billions of neurons and there are trillions of connections between them.

At the very least, these neural networks are performing information processing. Different subnetworks or regions of the brain are responsible for different types of information processing: seeing, hearing, remembering, reasoning, etc.

The individual neuron can be considered the smallest piece in this vast information processing factory. Let’s think of a neuron as a little machine that takes several inputs and gives an output. For simplicity’s sake, let’s assume that the output is a number between 0 and 1. We need a mathematical function for transforming the inputs to a single number within this range. This is called the **activation function** and it’s best illustrated with an example.

## Example: Deciding whether or not to wear a sweater

As you’re getting dressed in the morning, you need to decide whether you should put on a sweater. The decision is multifactorial. You need to account for temperature, humidity, if there is a breeze, and whether it will be sunny or cloudy. Through a combination of your accumulated life experience, a quick look out the window, and a consultation of the weather report, you make a snap decision to put on a sweater. But how did you decide that?

If we were trying to capture that logic in a mathematical function, it would have 4 inputs:

- Temperature
- Humidity
- Wind speed
- Cloud cover

It would have only a single output: **yes** or **no** to the sweater. Maybe these are represented as **1** for **yes** and **0** for **no**.

## The “Perceptron”

What we’ve built here is called a perceptron: a single artificial neuron that takes one or more inputs and produces an output.

Following our intuition, we need a decision function that takes into account each of the inputs, gives each their relative weight, and makes a decision. Perhaps cloudiness doesn’t really matter to us, so we give it a very low weight, but temperature is a crucial factor, so we give greater weight to that variable.

Based on your local climate, your brain has given each of those factors a certain weight, given your experiences, and the decision you make will feel almost automatic.

Sometimes you make the wrong call: It was windier than expected and you caught a chill, so you revise your decision function for the next time.

But how could we teach a computer to learn from repeated experiences?

## Training a neural network

The idea behind “training” a neural network to learn some kind of desired pattern is to update the weights on the inputs to every neuron. In our simple example above, our “sweater-decider” only had 4 weights to tune.

Initially our neurons will be pretty naive and make a lot of bad calls. This initial values of the weights are random. But given a set of inputs, if we know what the output *should* be, then we can pass that information backwards through the network and adjust all the weights slightly in the direction of the desired answer.

The algorithm that achieves this is called backpropagation. While the algorithm was discovered by multiple researchers as far back as the 1960s, it wasn’t until a seminal paper by David Rumelhart, Geoffrey Hinton, and Ronald Williams in 1986 that the algorithm started to get more widespread attention. Today it is foundational to how deep neural networks of all kinds are trained. Michael Nielsen has a great chapter on backpropagation in his online book *Neural Networks and Deep Learning*.

The basic intuition is to calculate the loss function: the difference between the predicted values and the ground truth labels of the target variable. Then we take the derivatives with respect to each of the weights in the network and use this to calculate an incremental weight update. We repeat this process iteratively until the loss function reaches a minimum, i.e. the neural network is mostly getting the right answers.

## Reading Handwritten Digits

An early application of neural networks was in the recognition of handwritten digits. This is especially important when sorting the mail by postal code, for example the 5-digit ZIP codes used by the US Postal Service.

This type of deep neural network performs a bunch of transformations of the input image via convolutions. A convolution is a blending of the original image with a “kernel” that can help the neural network pick out patterns like sharp edges.

Deep convolutional neural networks have now become foundational to a lot of work in computer vision. Autonomous vehicles depend on them to “see” obstacles, such as other cars, pedestrians, cyclists, and traffic signals.

## Deep Neural Networks

Today’s deep neural networks keep getting more and more sophisticated. Some image classification networks are dozens of layers deep and have millions of tunable parameters. This creates problems. The main drawback is that these networks require millions of examples in order to be trained, and training them comes at huge computational cost. Training can take days or weeks, running on many GPUs and using a lot of electricity.

## Deep learning at Paladin AI

At my company, deep neural networks are just one technology that we’re employing to optimize pilot training in aviation. We’re teaching them to recognize what pilots are doing when they’re flying the aircraft and how well they’re doing it.

While it’s likely that air transport will eventually become fully autonomous, this neglects some of the main reasons for keeping pilots in the cockpit: maintaining situational awareness, reasoning through and solving complex problems, managing crew resources, communicating with the air traffic control, and taking over full control when things go wrong. Also, neural pilot assessment networks can be used to evaluate the quality of autonomous agents.

Tomorrow’s pilots will certainly be different from today’s pilots. At Paladin AI, we believe AI can make air travel safer, and help humans master complex skills in less time. Today we’re teaching the AI. Tomorrow, maybe the AI will be teaching us.