Introduction to Neural Network
First step towards deep learning, brain of a machine
Nowadays, you might have heard a lot about AI technology and its advancement. One scientific theory behind the work of AI is Neural Network. Does it sound too scientific? Put it this way, Neural Network is the core of Deep Learning, a very popular machine learning method to date. As the name ‘Deep Learning’ suggests, a machine — say a computer — will be trained to learn things very deeply.
The concept of Deep Learning method is to train a machine to imitate neural network activities in a human brain. The method enables machines to make similar (or even better) judgements to what humans can make. A well-trained machine can produce highly accurate results with untiring power.
Before moving onto Deep Learning, let’s take a look at how biological neural network (BNN) in a human brain works.
Biological Neural Network (BNN) in a Brain
Inside a human brain, there is a complex network of neurons (nerve cells). A brain contains about 100 billion neurons. Can you imagine how huge the network is? Neurons are linked together through synapses, the points that exchange electrical signals from one neuron to another. In one neuron, there is an average of 7,000 synapses. It is found that a human brain has its maximum amount of synapses, around 1 quadrillion, at the age of three. Then, the number reduces to 100–500 trillion synapses when a person reaches their adulthood.
Human learning occurs through brain activities inside the neural network. Information is constructed and organized through the exchanges of electrical signals — in milli-volts (mV) — at synapses. Based on how the biological neural network works, a model of Artificial Neural Network is invented for ‘Machine Learning’.
Artificial Neural Network (ANN)
An artificial neural network (ANN) is an imitation of the biological neural network (BNN). The above figure demonstrates an artificial neural unit. The neuron receives inputs (X1 to Xn) then computes to produce an output which will be sent to other neurons. In other words, an output from one neuron will be an input of other neurons in that neural network.
To produce the output, the inputs must be weighted (w). The weighted inputs will be summed. At this point, a new parameter called ‘bias’ (b) will then be added to the equation (see the figure 2 above). The value ‘a’ from the calculation will be forwarded to the activation function to produce the output.
The activation function in the ANN is designed to resemble the action potential activity in the BNN. A stimulus in the BNN (figure 3), can be compared to the value ‘a’ in the ANN (figure 2). Turning to the brain for now, when the stimulus exceeds the threshold, the information will be transferred to other neurons. On the contrary, when it is below the threshold, the activity stops. The ANN activation function works in a similar way to control the output. However, it should be noted that there are many different types of activation functions, depending on the expected output (see Figure 4).
Perceptron
Perceptron is the first Artificial Neural Network (ANN) which was invented in 1958. The project was funded by the United States Office of Naval Research. Initially, the concept was implemented in the form of software. After its sound success, hardware was then invented to handle just 400-pixels image recognition.
Figure 5 below shows the first perceptron hardware.
Figure 6. demonstrates the output of image classification produced by Perceptron. It would draw a line (linear classification) to classify the images into two different classes, here ‘cats’ and ‘dogs’.
Initially, it was expected that Perceptron should be able to solve complicated tasks. Unfortunately, it was only capable of producing a linear output. Therefore, it does not fit well with real world data that tend to be non-linear.
Figure 7 shows a classic example problem that a single linear function cannot solve. To simplify, the problem shown in the figure requires at least two lines to classify the black and the white dots.
It is sad but true, Perceptron was not quite a success.
Not only with its poorer-than-expected performance, but also too high expectations of AI from the users, which could not be fulfilled, triggered a long period of AI Winter (Late 1970 — Early 1990), in which AI research and development was slow.
Multilayer Perceptron (MLP)
With an attempt to deal with the linear output problem, Multilayer perceptron (MLP) was proposed in 1985. MLP is an improved version of Perceptron. To work more effectively with non-linear data in the real world, multiple neurons and multiple hidden layers are added in.
From the above figure, if we consider the output graphics, different positions of the white and the yellow circles indicate errors — the differences between the calculated and the desired outputs. To minimize errors, backpropagation (Backprop) algorithm is used. It helps adjust the parameters by stepping backward layer-by-layer.
To identify how each parameter should be adjusted, the Gradient descent algorithm is applied (See Fig 9). It utilizes partial differential equations to find the optimal parameters for each layer to obtain the lowest value of errors.
MLP was popular for speech recognition, image recognition and machine translation applications. However, they consumed extremely high resources to work. When Support vector machine (SVM) — a newer algorithm which worked more efficiently — was introduced, it then overshadowed MLP.
Convolutional Neural Network (CNN)
The classical AI algorithms were designed to work with one-dimensional data. Thus, this leads to their main disadvantage in dealing with two-dimensional data such as images.
The classical AI algorithms would classify the two pictures below as two completely different pictures.
In fact, there is a relationship between them, the same object in different positions. To fix the disadvantage, CNN is introduced in 1998.
The core concept of CNN is called ‘Convolution (2-D cross-correlation)’. It preserves relationships of pixels in two dimensions. Therefore, the two pictures above are classified as two related pictures, not two different pictures.
Figure 10 illustrates the first CNN model, LeNet-5. The model is used to classify handwritten numbers. The classification results yield reasonable high accuracy. This revitalizes the concept of neural network.
At present, the advancement of computer technologies, high performance graphic processing units and prominent research in AI speed up the development of Deep Learning (Multi-layered artificial neural network). With its ability to deal with complicated and huge amount of data and to produce highly accurate results, Deep Learning is applied in a vast areas of AI applications. Who knows, one day a machine might be able to think like a human. This is a promising future of Artificial Neural Network.
by Kriengkrai Jirawongaram <kk@jira.org> ID: 22p25c0762@EXP