Everything You Need to Know About Artificial Neural Networks

photo credit

The year 2015 was a monumental year in the field of artificial intelligence. Not only are computers learning more and learning faster, but we’re learning more about how to improve their systems. Everything is starting to align, and because of it we’re seeing strides we’ve never thought possible until now. We have programs that can tell stories about pictures. We have cars that are driving themselves. We even have programs that create art. If you want to read more about advancements in 2015, read this article. Here at Josh.ai, with AI technology becoming the core of just about everything we do, we think it’s important to understand some of the common terminology and to get a rough idea of how it all works.

What are they?

A lot of the advances in artificial intelligence are new statistical models, but the overwhelming majority of the advances are in a technology called artificial neural networks (ANN). If you’ve read anything about them before, you’ll have read that these ANNs are a very rough model of how the human brain is structured. Take note that there is a difference between artificial neural networks and neural networks. Though most people drop the artificial for the sake of brevity, the word artificial was prepended to the phrase so that people in computational neurobiology could still use the term neural network to refer to their work. Below is a diagram of actual neurons and synapses in the brain compared to artificial ones.


Fear not if the diagram doesn’t come through very clearly. What’s important to understand here is that in our ANNs we have these units of calculation called neurons. These artificial neurons are connected by synapses which are really just weighted values. What this means is that given a number, a neuron will perform some sort of calculation (for example the sigmoid function), and then the result of this calculation will be multiplied by a weight as it “travels.” The weighted result can sometimes be the output of your neural network, or as I’ll talk about soon, you can have more neurons configured in layers, which is the basic concept to an idea that we call deep learning.

Where do they come from?

Artificial neural networks are not a new concept. In fact, we didn’t even always call them neural networks and they certainly don’t look the same now as they did at their inception. Back during the 1960s we had what was called a perceptron. Perceptrons were made of McCulloch-Pitts neurons. We even had biased perceptrons, and ultimately people started creating multilayer perceptrons, which is synonymous with the general artificial neural network we hear about now.

A simple multilayer perceptron (photo credit)

But wait, if we’ve had neural networks since the 1960s, why are they just now getting huge? It’s a long story, and I encourage you to listen to this podcast episode to listen to the “fathers” of modern ANNs talk about their perspective of the topic. To quickly summarize, there’s a hand full of factors that kept ANNs from becoming more popular. We didn’t have the computer processing power and we didn’t have the data to train them. Using them was frowned upon due to them having a seemingly arbitrary ability to perform well. Each one of these factors is changing. Our computers are getting faster and more powerful, and with the internet, we have all kinds of data being shared for use.

How do they work?

You see, I mentioned above that the neurons and synapses perform calculations. The question on your mind should be: “How do they learn what calculations to perform?” Was I right? The answer is that we need to essentially ask them a large amount of questions, and provide them with answers. This is a field called supervised learning. With enough examples of question-answer pairs, the calculations and values stored at each neuron and synapse are slowly adjusted. Usually this is through a process called backpropagation.

A simple backpropagation algorithm (photo credit)

Imagine you’re walking down a sidewalk and you see a lamp post. You’ve never seen a lamp post before, so you walk right into it and say “ouch.” The next time you see a lamp post you scoot a few inches to the side and keep walking. This time your shoulder hits the lamp post and again you say “ouch.” The third time you see a lamp post, you move all the way over to ensure you don’t hit the lamp post. Except now something terrible has happened — now you’ve walked directly into the path of a mailbox, and you’ve never seen a mailbox before. You walk into it and the whole process happens again. Obviously, this is an oversimplification, but it is effectively what backpropogation does. An artificial neural network is given a multitude of examples and then it tries to get the same answer as the example given. When it is wrong, an error is calculated and the values at each neuron and synapse are propagated backwards through the ANN for the next time. This process takes a LOT of examples. For real world applications, the number of examples can be in the millions.

Now that we have an understanding of artificial neural networks and somewhat of an understanding in how they work, there’s another question that should be on your mind. How do we know how many neurons we need to use? And why did you bold the word layers earlier? Layers are just sets of neurons. We have an input layer which is the data we provide to the ANN. We have the hidden layers, which is where the magic happens. Lastly, we have the output layer, which is where the finished computations of the network are placed for us to use.


Layers themselves are just sets of neurons. In the early days of multilayer perceptrons, we originally thought that having just one input layer, one hidden layer, and one output layer was sufficient. It makes sense, right? Given some numbers, you just need one set of computations, and then you get an output. If your ANN wasn’t calculating the correct value, you just added more neurons to the single hidden layer. Eventually, we learned that in doing this we were really just creating a linear mapping from each input to the output. In other words, we learned that a certain input would always map to a certain output. We had no flexibility and really could only handle inputs we’d seen before. This was by no means what we wanted.

Now introduce deep learning, which is when we have more than one hidden layer. This is one of the reasons we have better ANNs now, because we need hundreds of nodes with tens if not more layers. This leads to a massive amount of variables that we need to keep track of at a time. Advances in parallel programming also allow us to run even larger ANNs in batches. Our artificial neural networks are now getting so large that we can no longer run a single epoch, which is an iteration through the entire network, at once. We need to do everything in batches which are just subsets of the entire network, and once we complete an entire epoch, then we apply the backpropagation.

What kinds are there?

Along with now using deep learning, it’s important to know that there are a multitude of different architectures of artificial neural networks. The typical ANN is setup in a way where each neuron is connected to every other neuron in the next layer. These are specifically called feed forward artificial neural networks (even though ANNs are generally all feed forward). We’ve learned that by connecting neurons to other neurons in certain patterns, we can get even better results in specific scenarios.

Recurrent Neural Networks

Recurrent Neural Networks (RNN) were created to address the flaw in artificial neural networks that didn’t make decisions based on previous knowledge. A typical ANN had learned to make decisions based on context in training, but once it was making decisions for use, the decisions were made independent of each other.

A recurrent neural network (photo credit)

When would we want something like this? Well, think about playing a game of Blackjack. If you were given a 4 and a 5 to start, you know that 2 low cards are out of the deck. Information like this could help you determine whether or not you should hit. RNNs are very useful in natural language processing since prior words or characters are useful in understanding the context of another word. There are plenty of different implementations, but the intention is always the same. We want to retain information. We can achieve this through having bi-directional RNNs, or we can implement a recurrent hidden layer that gets modified with each feedforward. If you want to learn more about RNNs, check out either this tutorial where you implement an RNN in Python or this blog post where uses for an RNN are more thoroughly explained.

An honorable mention goes to Memory Networks. The concept is that we need to retain more information than what an RNN or LSTM keeps if we want to understand something like a movie or book where a lot of events might occur that build on each other.

Sam walks into the kitchen.
Sam picks up an apple.
Sam walks into the bedroom.
Sam drops the apple.
Q: Where is the apple.
A: Bedroom
Sample taken from this paper.

Convolutional Neural Networks

Convolutional Neural Networks (CNN), sometimes called LeNets (named after Yann LeCun), are artificial neural networks where the connections between layers appear to be somewhat arbitrary. However, the reason for the synapses to be setup the way they are is to help reduce the number of parameters that need to be optimized. This is done by noting a certain symmetry in how the neurons are connected, and so you can essentially “re-use” neurons to have identical copies without necessarily needing the same number of synapses. CNNs are commonly used in working with images thanks to their ability to recognize patterns in surrounding pixels. There’s redundant information contained when you look at each individual pixel compared to its surrounding pixels, and you can actually compress some of this information thanks to their symmetrical properties. Sounds like the perfect situation for a CNN if you ask me. Christopher Olah has a great blog post about understanding CNNs as well as other types of ANNs which you can find here. Another great resource for understanding CNNs is this blog post.

Reinforcement Learning

The last ANN type that I’m going to talk about is the type called Reinforcement Learning. Reinforcement Learning is a generic term used for the behavior that computers exhibit when trying to maximize a certain reward, which means that it in itself isn’t an artificial neural network architecture. However, you can apply reinforcement learning or genetic algorithms to build an artificial neural network architecture that you might not have thought to use before. A great example and explanation can be found in this video, where YouTube user SethBling creates a reinforcement learning system that builds an artificial neural network architecture that plays a Mario game entirely on its own. Another successful example of reinforcement learning can be seen in this video where the company DeepMind was able to teach a program to master various Atari games.

Reinforcement learning (photo credit)


Now you should have a basic understanding of what’s going on with the state of the art work in artificial intelligence. Neural networks are powering just about everything we do, including language translation, animal recognition, picture captioning, text summarization and just about anything else you can think of. You’re sure to hear more about them in the future so it’s good that you understand them now!

This post was written by Aaron at Josh.ai. Previously, Aaron worked at Northrop Grumman before joining the Josh team where he works on natural language programming (NLP) and artificial intelligence (AI). Aaron is a skilled YoYo expert, loves video games and music, has been programming since middle school and recently turned 21.

Josh.ai is an AI agent for your home. If you’re interested in following Josh and getting early access to the beta, enter your email at https://josh.ai.

Like Josh on Facebook — http://facebook.com/joshdotai

Follow Josh on Twitter — http://twitter.com/joshdotai