Artificial Neural Networks for Machine Learning

Published in

Capital One Tech

6 min readApr 26, 2019

Part 5 of a Series on Introductory Machine Learning Algorithms

We’ve covered k-nearest neighbor and k-means clustering, and naives bayes classifiers, and random forest algorithms, today we’ll cover artificial neural networks.

Introduction

Artificial neural networks (ANN) are a buzzword in machine learning right now — both for the technical expert and the everyday user. We see these algorithms at work everyday to classify security images, recognize your face when you go to unlock your phone, and interpret your voice commands. In today’s technology-driven world, we use neural networks every day, in all kinds of places, and you probably don’t even realize it!

However, while the application may be new, the technology itself has been around awhile. ANNs have been around since 1943, when they were first introduced in Warren McCulloch and Walter Pitts’ paper “A Logical Calculus of the Ideas Immanent in Nervous Activity”, describing how networks of artificial neurons could be used to solve various logic problems. However, they were not widely used until 1969 when a technique called backpropagation was created. This technique allows the artificial neurons to adjust themselves when the solution they come up with is not the solution that was expected. It increased accuracy in networks by allowing researchers to create self-adjusting networks that modify their own connections between neurons.

Today, ANNs are growing in popularity due to the increased amount of data and computing power available. In the past, these were two things that prevented researchers from using them in academic settings. Now that we have more data than ever, and faster computer-processing, we can properly apply ANNs in more practical settings.

You are probably wondering, “Well, what exactly is an artificial neural network?” They are systems modeled after the human brain, mimicking the ways we learn and make decisions. These networks consist of input and output layers, as well as hidden layers, similar to the neural networks in our brains.

Each of these layers is simply a group of neurons. The input layer represents the data that we feed the ANN. The output layer is where the results of the algorithm are displayed for us to examine. In cases with greater information and more complexity, there are hidden layers between the input and output layers performing additional computations.

ANNs with more than one hidden layer are classified as deep-learning and their depth is determined by the exact number of hidden layers.

Pros:

High-performing.
Solves problems humans may not be able to conceptualize.
Can be used with regression and classification problems.
Can handle large amounts of data.

Cons:

“Black box” nature, meaning it’s hard for researchers to understand why ANNs make the conclusions that they do, since it’s hard to trace back the numeric values these models produce.
Longer time to train the model.
Requires lots of data, more than a typical algorithm.
Expensive due to the amount of computational power required.

ANNs are extremely powerful and accurate, especially with the amount of data we now have at our disposal. However, they are still supervised learning models, meaning researchers must properly label the data in order to train the model and achieve results.

Where to Use Artificial Neural Networks

ANNs are used in not only cutting edge machine learning applications, but in situations and applications that have been around for decades. An ANN with the name of MADALINE was actually the first one ever applied to a real world problem back in 1959. This model eliminates the echoes produced by telephone lines using a layer to act as a sound filter.

One of the more recent uses of ANNs is in voice recognition technologies. For this application, the ANN must be trained to accurately understand what people with different voices and accents are saying. This data is trained by looking at the mathematics behind sound waves and what constitutes a certain word. Once this data is properly trained, you can input data into the model and allow the hidden layers to compute. These layers learn to prioritize the wave patterns that mimic language. Once these hidden layers output accurate information, the model is ready to be used to recognize different voice commands that humans give it.

The Mathematics Behind Artificial Neural Networks

Today, there are multiple programs that can build out ANNs for you. When using them, the developer’s main task is to determine the number of layers, the number of neurons in each layer, the activation functions, and the epoch values (or reference points of time). This requires a deep understanding of linear algebra, but I will try to keep it simple enough to understand if you aren’t too familiar with this kind of math.

First, let’s start off with the mathematics behind a single neuron.

Each neuron has a column vector of weight (wT) and bias, each of which changes as the neurons learn from the training data. Each time the model runs z (the weighted average values of the vector), is recalculated. This result is then passed into an activation function.

Now, let’s look at layers of neurons. In order to determine the mathematics behind the layers, you must look at the z values found from the individual neurons and use them to create matrix equations.

This is pretty complicated if you aren’t familiar with matrices, so I will show you a picture.

https://towardsdatascience.com/https-medium-com-piotr-skalski92-deep-dive-into-deep-networks-math-17660bc376ba

If you want to do an even deeper dive of linear algebra, read this.

ANNs depend highly on activation functions, which allow them to follow a non-linear model and learn data very quickly. Here are a few of the most popular ones that you can choose to implement:

There is also something called a loss function that is used to determine how accurate the ANN’s results are or how the network converges. The function is used to determine how close we are to achieving accurate predictions, based on the training data and the output layer.

As you can see, the values of a neuron, and therefore the layers of neurons, are determined by the value of the weight and bias of each neuron. In order to determine these values we must look at the gradient descent function, a popular function used in calculus. The gradient on each calculation can help us determine whether we need to increase or decrease our w and b values by looking at the loss function. This gradient is calculated using backpropagation, which I mentioned briefly in the introduction.

Conclusion

As you can see, artificial neural networks depend on linear algebra and calculus, making them very mathematically focused. If you have a further interest in these models then you must study basic vectors, matrices, and derivatives. A deep understanding of these concepts is needed to build and implement an ANN. It is impossible to calculate the formulas behind ANN without strong skills in this subject.

Artificial neural networks will only continue to grow in popularity as we develop new technology and the amount of collected data and computing power increases. Developing a deeper understanding of this algorithm will be extremely important for those interested in the world of machine learning.

For more resources, check out some projects using artificial neural networks:

DISCLOSURE STATEMENT: © 2019 Capital One. Opinions are those of the individual author. Unless noted otherwise in this post, Capital One is not affiliated with, nor endorsed by, any of the companies mentioned. All trademarks and other intellectual property used or displayed are property of their respective owners.

Artificial Neural Networks for Machine Learning

Introduction

Where to Use Artificial Neural Networks

The Mathematics Behind Artificial Neural Networks

Conclusion

Related:

Written by Madison Schott