Specifics of Artificial Neural Networks (ANN’s) , from Algorithm to Applications
From Applications to the Limitations, from the Mathematics to the underlying Algorithm — this blog intends to provide the basic understanding of Artificial Neural Networks (ANN’s).
Some of the earliest learning algorithms we recognize today were intended to be computational models of biological learning i.e. models of how learning happens or could happen in the brain. As a result, one of the names that Deep Learning has gone by is “Artificial Neural Networks” (ANN’s).
The perspective of Deep Learning Models is that they are engineered systems inspired by the biological brain.
Applications of Artificial Neural Networks
Some of the Use Cases of ANN’s are listed below:
Classification
In Marketing: consumer spending pattern classification;
In Defence: radar and sonar image classification
In Agriculture & Fishing: fruit and catch grading
In Medicine: ultrasound and electrocardiogram image classification, EEGs, medical diagnosis
Recognition and Identification
In General Computing and Telecommunications: Speech, vision and handwriting recognition
In Finance: Signature verification and bank note verification
Assessment
In Engineering: product inspection monitoring and control
In Security: motion detection, surveillance image analysis and fingerprint matching
The Key Elements of Neural Networks
Each neuron within the network is usually a simple processing unit which takes one or more inputs and produces an output.
At each neuron, every input has an associated weight which modifies the strength of each input. The neuron simply adds together all the inputs and calculates an output to be passed on.
The output is determined through a non-linear activation function. The activation function is usually a logistic function that transforms the output to a number that is between 0 &s1. There can be other activation functions, which may be useful — these are discussed in the next sub-section.
Neural computing requires a number of neurons, to be connected together into a “neural network”. Neurons are arranged in layers.
Activation functions
The activation function is generally non-linear. Linear functions are limited because the output is simply proportional to the input.
Training Methods for Artificial Neural Networks (ANN’s)
Supervised learning
In supervised training, both the inputs and the outputs are provided. The network then processes the inputs and compares its resulting outputs against the desired outputs. Errors are then propagated back through the system, causing the system to adjust the weights which control the network. This process occurs over and over as the weights are continually tweaked. The set of data which enables the training is called the training set. During the training of a network the same set of data is processed many times as the connection weights are ever refined.
Example architectures : Multi-Layer Perceptron
Unsupervised learning
In unsupervised training, the network is provided with inputs but not with desired outputs. The system itself must then decide what features it will use to group the input data. This is often referred to as self-organization or adaption.
Example architectures : Kohonen, ART
Specifics for Training a Neural Network
Step 1: Pick a Network Architecture (i.e. Connectivity Pattern between the neurons)
First, pick a network architecture; choose the layout of your neural network, including how many hidden units in each layer and how many layers in total you want to have.
- Number of input units = dimension of features x(i)
- Number of output units = number of classes
- Number of hidden units per layer = usually more the better (must balance with cost of computation as it increases with more hidden units)
- Defaults: 1 hidden layer. If you have more than 1 hidden layer, then it is recommended that you have the same number of units in every hidden layer
Step 2: Random Initialization of the Weights
Initializing all theta weights to zero does not work with neural networks. When we back-propagate, all nodes will update to the same value repeatedly. Instead we can randomly initialize our weights for our Θ matrices using the following method:
Hence, we initialize each Θ(l)ij to a random value between[−ϵ,ϵ]. Using the above formula guarantees that we get the desired bound.
Step 3: Implement Forward Propagation to get initial prediction for any x(i)
Step 4: Implement the Cost Function for the Artificial Neural Network (ANN)
Let’s first define a few variables that we will need to use:
- L = total number of layers in the network
- sl = number of units (not counting bias unit) in layer l
- K = number of output units/classes
Recall that in neural networks, we may have many output nodes. We denote hΘ(x)k as being a hypothesis that results in the kth output
The cost function of an ANN is represented as below:
Step 5: Implement back-propagation to compute partial derivatives
“Back-propagation” is neural-network terminology for minimizing our cost function. Our goal is to compute “min J(Θ)”. That is, we want to minimize our cost function J using an optimal set of parameters in theta.
Step 6: Use gradient checking to confirm that your back-propagation works. Then disable gradient checking.
Step 7: Use gradient descent or a built-in optimization function to minimize the cost function with the weights in theta
When we perform forward and back propagation, we loop on every training example
The following image gives us an intuition of what is happening as we are implementing our neural network:
Ideally, you want hΘ(x(i)) ≈ y(i). This will minimize our cost function. However, keep in mind that J(Θ) is not convex and thus we can end up in a local minimum instead.
Disadvantages of Neural Networks:
Neural Nets are very general and can approximate complicated relationships. But they also come with disadvantages.
One disadvantage of neural nets is that the approximating models relating inputs and outputs are purely “black box”models and they provide very little insight into what these models really do.
Also, the user of neural nets must make may modeling assumptions, such as the number of hidden layers and the number of hidden units in each hidden layer, with very little guidance to do this. It takes considerable experience to find the most appropriate representation.
Furthermore, back-propagation can be quite slow if the learning constant is not correctly chosen.
Summary
Hope this blog post has helped you with a basic understanding of the Artificial Neural Networks, the applications, the algorithm-specifics and the limitations.
Thanks for reading through this blog post. Any suggestions for further improving this would be cheerfully solicited.