STEP-BY-STEP ARTIFICIAL NEURAL NETWORKS

Mert Alacan
Yıldız Technical University - Sky Lab
6 min readMar 27, 2021

Neural networks algorithms are found in subfield of machine learning area which is called deep learning and these networks have so deep, high flexibility and so powerful process. Data inputs come to “hidden layers” and some magical mathematical events are done. If you use a lot of hidden layers (ex. 200), it can be deep model for processing your model. Every hidden layer includes neurons and you can adjust number of neurons. Because of adjusting numbers of neurons and hidden layers, we can say that neural networks are flexible. Some concepts may come to complex to you but this post tells concepts with all details in next part.

These models are used at regression and classification problems. This post includes what is Artificial Neural Networks, its components and explains mathematical behind of ANN (Artificial Neural Networks). Mathematical behind is explained according to binary classification problem. I hope, when you are reading this post, you will enjoy. Let’s begin.

1. WHAT IS ARTIFICIAL NEURAL NETWORKS ?

Artificial Neural Network (ANN) is an information processing technology inspired by human brain. ANN imitates biological neural network processes. In other words, it is digital modeling of neuron cells. Neurons form various networks and these networks have learning capacity, memorize and reveal the relationship between data.

2. COMPONENTS OF ANN

Before starting mathematical behind of ANN, you should know some concepts of this model. So, this part is such a significant for understanding ANN.

2.1. Layers

Figure 2.1. Layers

In Figure 2.1, layers are shown. Input layers are sample size of data points. Every circles are named nodes. In hidden layer, some ANN processes are worked and a result are produced at output layer.

2.2. Weights and Bias

These are learnable parameters in machine learning. These parameters are updated and appropriate values are found.

2.3. Activation Functions

With these functions, output results are taken. In Artificial Neural Network, some different activation functions are used such as sigmoid, tanh, relu, softmax etc.

2.3.1. Sigmoid Function

It is a logistic function and this function is frequently used in deep learning. Value ranges are between 0 and 1. For binary classification problems, if values are between 0 and 0.5, label of my data is 0, otherwise 1.

Figure 2.2. Sigmoid Function

In figure 2.2., line chart of sigmoid function is seen and the formula of this function is:

2.3.2. Tanh Function

Tanh (Hyperbolic Tangent Function) is trigonometric function and it returns between -1 and 1 values.

Figure 2.3. Tanh Function

In figure 2.3., line chart of tanh function is seen and the formula of this function is:

2.3.3. ReLU Function

ReLU (Rectified Linear Activation Function) is a linear functions and that gives a value between 0 and infinite.

Figure 2.4. ReLU Function

In figure 2.4. line chart of tanh function is seen and it returns values from 0 to infinite. The reason is that this function is used with max(0,z). If z < 0, so it returns 0 value, otherwise z value.

2.3.4. Softmax

This activation function is such an important process. Actually, we examine binary classification processes but if your data labels have multiclass, then softmax is used because this process extract results of all labels with probabilistic logic. The best label probability is chosen.

3. HOW ARTIFICIAL NEURAL NETWORK WORKS ?

When an ANN model works, it has several processes. In this part, ANN work principle will show. Model will tell over figure 2.1.

3.1. Initializing Weights and Bias And Examining Forward Propagation

For every layer, some formulas are made. In figure 2.1. has 2 hidden layer and 1 output layer. Also, it has 1 input layer. From input layer to output layer, formulas are shown in below.

Forward Propagation For 2 Hidden and 1 Output Layer

Notations of formulas:

· z[1](m) : Result of input layer

· w : Weights

· b : Bias

· x(m) : Input values of data

· A[1](m) : Activation function results to send

· z[2](m) : Result of first hidden layer.

· A[2](m) : Activation function results to send

· yhat(m) : Prediction results of data

These all steps are named forward propagation. In this formula, firstly randomly weight and bias values are selected. For first step, we suppose that weight is 0.01 and bias is 0. Initializing values should be around 0. Activation functions produce a result and send from layer to layer. Finally, with last activation function, prediction results are produced.

If we want to comment prediction results, last activation function is sigmoid function and if our case is classification, so we can say that results from 0 to 0.5 produce 0 label, otherwise (from 0.5 to 1) produce 1. If your labels are binary, so you can use sigmoid, tanh. On the other hand, if your labels have multiclass, so you can use softmax activation function but we think that our case is binary classification problem. In addition, you can use activation functions in different order. For example, ReLU activation function is used at A[1](m) but you may use tanh activation function instead of ReLU.

3.2. Cost Function

After forward propagation, ANN looks cost function before passing backward propagation part. Cost function gives a numeric error value between real value and predicted value. The purpose of ANN is minimizing this value. Decreasing of cost function value shows that model is getting better. Cross entropy loss function is the one of the best example. Its formula:

3.3. Logical Approach To Backward Propagation

Backward propagation is shape of adjusting weights and bias for decreasing cost function. For adjusting these parameters, forward propagation formulas are differentiated from beginning of yhat(m) to the end of z[1](m) or from end to beginning of forward propagation. For derivative logic, gradient descent algorithm is used. In short, because of backward propagation, new weights and biases values are defined with Gradient Descent logic.

3.4. Update Parameters

After finding weights and biases that minimize the cost, parameters may update. Updating process is made with a simple formula for weight and bias.

Alpha represents the learning rate. This rate should be so small such as 0.001, 0.003, 0.01, 0.03 etc. If this value is so big, models move away from global minimum point quickly and you can not take good results.

For all layers, all weights(w[1], w[2], w[3]) and biases(b[1](m),b[2](m),b[3](m)) are updated with this general formula.

3.5. ANN Model Loops

From part 3.1. to 3.4. is just one iteration of our ANN model. You can choose iteration number of ANN model. An ANN iteration contains the following items:

1. It chooses an initial weight and bias value. (Just first iteration)

2. Forward propagations are made and prediction results are taken.

3. Prediction values of forward propagation is compared with real values at cost function.

4. With backward propagation, weight and bias values are selected.

5. These weight and bias values are updated.

6. From 2 to 5, iterations are repeated. Final iteration makes just second iteration and prediction results are completely defined.

Thus, an ANN model is formed with all iterations.

4. CONCLUSION

As a conclusion, we examined one of the neural network models which is called Artificial Neural Network. For classification models, we saw mathematical logic of ANN. Because of this model, some concepts were learned such as hidden layers, neurons etc. I hope, this information added something about neural networks to you. Thank you for reading.

5. REFERENCES

  • https://en.wikipedia.org/wiki/Artificial_neural_network Date Access: 14.03.2021
  • Sharma S., Sharma S., Athaiya A., “Activation Functions in Neural Networks”, International Journal of Engineering Applied Sciences and Technology, 2020 Vol. 4, Issue 12, ISSN №2455–2143, Pages 310–316

--

--