What a Neural Network is?

Published in

CodeX

8 min readJul 14, 2021

Introduction

Hi everyone, today I’m want to explain what a neural network (NN) is, the main reason is because many people listen about NN but don’t know what is, how it works and his purpose. This is a story for such people or for newbies in this field of AI that have doubts about the NN concept.

A neural network is an structure that tries to simulate the biological structure of neurons and their purpose, get information about an stimulus (get data in the NN case), pass this information through other neurons, and generate an adequate answer to this stimulus.

The humans need data and experience to learn, how a NN can learn? Easy, the neural network also need data and experience to learn, but this concept will be explained lately in the training section.

The perceptron.

Before we dive deep into what a neural netwok is and how it works, we should know the concept of perceptron. The perceptron is the basic unit in a NN that we can combine with other perceptrons to build a big neural network. The NN have many similarities with a brain structure, as we know, the basic unit of our brain have a basic unit named neuron, and his combination creates a biological neural network.

As we see, the perceptron have many parts:

Inputs: Provide data to the output node.
Weight: Determines the input influence in the final result, if the input have a high weight his influence in the final result will be high. Initially, the NN doesn’t know the appropriate value for the weights, for this reason, by default the weights are initiated randomly (we can modify this).
Output: Uses a mathematical function to determine the final result.

Example: Imagine that we want to train a NN to determine if a person is healthy or is not and we have only two variables, the sex and if the person practice any sport (the inputs). Once we have these values, we pass it to the output node that make the necessary calculations to determine if the person is or not healthy. When we feed the output node with this two variables the weights should be modified because the sex shouldn’t have influence in the final result (in this case).

Note: The weights modifications is carried out by an algorithm named “backpropagation”. This is an advanced concept but I will try to explain in an easy way in the training neural network section of this article, but if you want to know more about this algorithm I recommend reading some books about deep learning and watch some YouTube videos about this algorithm.

Neural Network.

Now that know what a percptron is and how it works, we are ready to know what a NN is. The NN is a set of perceptros fully connected that works together to achieve a common goal.

As we see, the neural networks are perceptrons fully connected between them and have three main parts.

Input layer: Layer we get the inputs from, we have one input for each feature in the our dataset.
Hidden layers: Their main purpose is apply the weights to the inputs and propagate it to the next layer that could be other hidden layer or the output layer. There are no rules about the number of hidden layers and the number of nodes that should contain a NN, we should modify it according to our training results.

Note: A value only will be propagate to the next layer if this value is high enough, otherwise, this value will not be propagated. This could cause “dead” nodes, this means nodes that doesn’t contribute to the final result because his value is very close to zero. If we have to much dead nodes, the NN won’t learn.

Output layer: Produce the output based on the output of the last layer. We will have one node if we are in front of a binary classification problem, but in multiclass classification problem we have one node for each class. E.g: If we want to determine if a person is healthy or not, we have one node, but if we have to determine if a person have some disease, we would have one node for each disease.

Building and training a Neural Network.

Congratulations, you already know what a neural network is and his purpose, but the bad news are that you don’t know how build it and how to train it, but don’t worry, I will try to show you in a easy way how you can do it. This part is only for people interested in how to program a NN, for this reason I will assume that you know some python programming concepts and also you know some machine learning concepts.

The easiest way to build a NN making use of one library in python, this library is tensorflow. To use this library we should make some imports before start.

from tensorflow.keras.datasets import fashion_mnist
from tensorflow.keras import layers, Sequential
from tensorflow.keras.layers import Dense, Dropout
from tensorflow import keras
import tensorflow as tf

In the next example we will use the fashion_mnist dataset to train our model, is a multiclass classification problem. The dataset contains images and we will try to predict what type of garment is.

We need four different sets, the set for training and the set for testing. The x’s sets contain the images from we will extract the features and the y’s tests contains the labels (that is the garment name).

(x_train, y_train), (x_test, y_test) = fashion_mnist.load_data()

Now that we have the data, it’s time to start with the neural network building, and the first step is create some variables that will be used in the future.

batch_size = 64
num_classes = 10
epochs = 300

batch_size: Number of samples that will be used in each iteration along the neural network. In the first iteration we will use 64 samples, in the next iteration we will use other 64 samples (random). Commonly we will use powers of two.
num_classes: Number of different classes that we have in our dataset.
epochs: Number of iterations, or the same, how many times we will pass along the neural network before consider that our neural network is well trained. This value is an hyperparameter, we should adapt the number of epochs according our results.

Note: The next code is necessary to make that the NN works properly, but the explanation of his function is out of the scope of this story.

# One-hot encoding of y
y_train = keras.utils.to_categorical(y_train, num_classes)
y_test = keras.utils.to_categorical(y_test, num_classes)# One array
x_train = x_train.reshape(60000, 784)
x_test = x_test.reshape(10000, 784)
x_train = x_train.astype('float32')
x_test = x_test.astype('float32')

Now it’s time to build the neural network, to do it we will use the next code:

#Building the model
model = Sequential()
model.add(Dense(128, activation='sigmoid', input_shape=(784,)))
model.add(Dense(128, activation='sigmoid'))
model.add(Dense(64, activation='sigmoid'))
model.add(Dense(num_classes, activation='softmax'))
model.summary()

We are making use of a sequential model, this means that we build a neural network with a stack of layers.

With the method .add(Dense()) we add a dense layer, dense layer is a layer fully connected between his nodes, but how many nodes contains the layer? The first number inside the Dense() function determines the number of nodes. As we see, the first layer contains a parameter named “input_shape” this is how many nodes will contain the input layer, because we are building the hidden and output layers, but for build the input layer only need specify the input_shape parameter in the first hidden layer and his size (784 in our case). The parameter “activation” allows to us select the mathematical formula to determine how the sum of weights will be transformed into an output. We can use different activation functions in each layer, or depend on our purpose or type of problem we will select an activation function or another.

You can see that the last layer have the variable “num_clases” as number of nodes, this is because this layer is the output layer and you know that in this layer we have one node for each class to predict. The activation function in this layer is different because the “softmax” function works well for multiclass problems.

Now it’s time to configure the model, we will use the compile function to do it.

from tensorflow.keras.optimizers import SGD#Compile the model
model.compile(loss='categorical_crossentropy',
optimizer='sgd',
metrics=['accuracy'])

Loss: This parameter allows us to select the loss function. The loss function calculate the difference between the real value and the value predicted (this values is named error value), and tries to minimize this error value to get an accurate model. This error will be propagated along the NN and when the output layer generates the result we will compare it with the desired result, then the NN generate the error value for each output node. Based on this errors, the neural network will propagate this values to the previous layer and modify the weights according to these errors to get better results, this is the concept of backpropagation. In our case we are used the “categorical_crossentropy” function because is the best function to multiclass problems.
Optimizer: Determines the way in which the weights are modified.
Metrics: Way in which we will a the assess the performance of our model.

Finally, it’s time to train the model an observe the results. We will use the fit function to start with the NN training.

history_sigmoid = model.fit(x_train, y_train, 
batch_size = batch_size, epochs=epochs, verbose=1, 
validation_split = 0.2)

We are using the train set to train the model with his labels (saved in y_train). The “verbose ”parameter it’s used to display the training progress of our NN, and the “validation_split” is used to select the desired percentage to validate the model, in our case we are using the 20% of samples in x_train to validation. The command output will be similar to this:

As we see, the loss is decreased step by step and the accuracy incremented. At the end of the training the accuracy should be near to 1 and loss near to 0.

Evaluating the model.

Well, you know how to train the NN, but you don’t know if the network is working well or if will predict the results in a correct way.

A simple way to assess the performing of our NN is with a confusion matrix. This matrix shows to us how much predictions are well predicted and how much predictions are wrong.

The code to do it is the next:

pred = model.predict(x_test) #Our predictionsfrom sklearn.metrics import confusion_matrix
confusion_matrix(y_test.argmax(axis=1), pred.argmax(axis=1))

And the result will be something like this:

Depending on the number of classes to predict we will have a number of rows and columns. The first row belong to the first label, and the first column also belongs to the first label, and so on. The values well predicted are the values in the diagonal, the values out of the diagonal are values predicted wrongly.

Conclusion

In this post I’m tried to explain in an easy and understandable way what a NN is and how it works, this is only an a gentle introduction to this algorithms, if you want to know more about this fascinating field of AI you should start to read books, taking courses, read other posts, write your own conclusions, and so on. I hope i have helped you to understand a little bit more the NN. Thank you for reading.