Specifics of Artificial Neural Networks (ANN’s) , from Algorithm to Applications

6 min readApr 7, 2018

From Applications to the Limitations, from the Mathematics to the underlying Algorithm — this blog intends to provide the basic understanding of Artificial Neural Networks (ANN’s).

Some of the earliest learning algorithms we recognize today were intended to be computational models of biological learning i.e. models of how learning happens or could happen in the brain. As a result, one of the names that Deep Learning has gone by is “Artificial Neural Networks” (ANN’s).

The perspective of Deep Learning Models is that they are engineered systems inspired by the biological brain.

Applications of Artificial Neural Networks

Some of the Use Cases of ANN’s are listed below:

Classification

In Marketing: consumer spending pattern classification;

In Defence: radar and sonar image classification

In Agriculture & Fishing: fruit and catch grading

In Medicine: ultrasound and electrocardiogram image classification, EEGs, medical diagnosis

Recognition and Identification

In General Computing and Telecommunications: Speech, vision and handwriting recognition

In Finance: Signature verification and bank note verification

Assessment

In Engineering: product inspection monitoring and control

In Security: motion detection, surveillance image analysis and fingerprint matching

The Key Elements of Neural Networks

Each neuron within the network is usually a simple processing unit which takes one or more inputs and produces an output.

At each neuron, every input has an associated weight which modifies the strength of each input. The neuron simply adds together all the inputs and calculates an output to be passed on.

The output is determined through a non-linear activation function. The activation function is usually a logistic function that transforms the output to a number that is between 0 &s1. There can be other activation functions, which may be useful — these are discussed in the next sub-section.

Neural computing requires a number of neurons, to be connected together into a “neural network”. Neurons are arranged in layers.

Activation functions

The activation function is generally non-linear. Linear functions are limited because the output is simply proportional to the input.

Training Methods for Artificial Neural Networks (ANN’s)

Supervised learning

In supervised training, both the inputs and the outputs are provided. The network then processes the inputs and compares its resulting outputs against the desired outputs. Errors are then propagated back through the system, causing the system to adjust the weights which control the network. This process occurs over and over as the weights are continually tweaked. The set of data which enables the training is called the training set. During the training of a network the same set of data is processed many times as the connection weights are ever refined.
Example architectures : Multi-Layer Perceptron

Unsupervised learning
In unsupervised training, the network is provided with inputs but not with desired outputs. The system itself must then decide what features it will use to group the input data. This is often referred to as self-organization or adaption.
Example architectures : Kohonen, ART

Specifics for Training a Neural Network

Step 1: Pick a Network Architecture (i.e. Connectivity Pattern between the neurons)

First, pick a network architecture; choose the layout of your neural network, including how many hidden units in each layer and how many layers in total you want to have.

Number of input units = dimension of features x(i)
Number of output units = number of classes
Number of hidden units per layer = usually more the better (must balance with cost of computation as it increases with more hidden units)
Defaults: 1 hidden layer. If you have more than 1 hidden layer, then it is recommended that you have the same number of units in every hidden layer

Step 2: Random Initialization of the Weights

Initializing all theta weights to zero does not work with neural networks. When we back-propagate, all nodes will update to the same value repeatedly. Instead we can randomly initialize our weights for our Θ matrices using the following method:

Hence, we initialize each Θ(l)ij to a random value between[−ϵ,ϵ]. Using the above formula guarantees that we get the desired bound.

Step 3: Implement Forward Propagation to get initial prediction for any x(i)

Step 4: Implement the Cost Function for the Artificial Neural Network (ANN)

Let’s first define a few variables that we will need to use:

L = total number of layers in the network
sl = number of units (not counting bias unit) in layer l
K = number of output units/classes

Recall that in neural networks, we may have many output nodes. We denote hΘ(x)k as being a hypothesis that results in the kth output

The cost function of an ANN is represented as below:

Cost Function of an Artificial Neural network

Step 5: Implement back-propagation to compute partial derivatives

“Back-propagation” is neural-network terminology for minimizing our cost function. Our goal is to compute “min J(Θ)”. That is, we want to minimize our cost function J using an optimal set of parameters in theta.

Step 6: Use gradient checking to confirm that your back-propagation works. Then disable gradient checking.

Gradient Checking

Step 7: Use gradient descent or a built-in optimization function to minimize the cost function with the weights in theta

When we perform forward and back propagation, we loop on every training example

The following image gives us an intuition of what is happening as we are implementing our neural network:

Gradient Descent to minimize the Cost Function

Ideally, you want hΘ(x(i)) ≈ y(i). This will minimize our cost function. However, keep in mind that J(Θ) is not convex and thus we can end up in a local minimum instead.

Disadvantages of Neural Networks:

Neural Nets are very general and can approximate complicated relationships. But they also come with disadvantages.

One disadvantage of neural nets is that the approximating models relating inputs and outputs are purely “black box”models and they provide very little insight into what these models really do.

Also, the user of neural nets must make may modeling assumptions, such as the number of hidden layers and the number of hidden units in each hidden layer, with very little guidance to do this. It takes considerable experience to find the most appropriate representation.

Furthermore, back-propagation can be quite slow if the learning constant is not correctly chosen.

Summary

Hope this blog post has helped you with a basic understanding of the Artificial Neural Networks, the applications, the algorithm-specifics and the limitations.

Thanks for reading through this blog post. Any suggestions for further improving this would be cheerfully solicited.