Analytics Vidhya
Published in

Analytics Vidhya

What is a Neural Network?

One of the most powerful and widely used artificial-intelligence approaches is called neural networks. But, What exactly are they? And, How they work? Let me explain it in plain English.

What is a Neural Network?

A neural network is a collection of connected nodes called neurons.

What is a Neuron?

A neuron is a node that has one or more inputs and a single output, as shown in Figure 1. Three actions take place inside of a neuron:

  • A weight is associated with each input to amplify or de-amplify it;
  • All weighted inputs are summed;
  • The sum is used as an input for an activation function which determines the final output.
Figure 1. A neuron is like an equation –it associates a weight to each of its inputs and calculates an output adding the weighted inputs and applying an activation function.

What is an Activation Function?

An activation function is a mathematical equation that outputs a small value for small inputs and a larger value if its inputs exceed a threshold. An example of an activation function commonly used is the sigmoid function shown in Figure 2.

Figure 2. The sigmoid function is commonly used as the neuron’s activation function.

The idea is quite simple: input values close to zero will cause a significant change in the output, while input values too big or too small will cause a minimal difference.

How Are Neurons Connected?

The output of one neuron can be used as an input to other neurons. Typically, neurons are aggregated into layers. A layer is a general term that applies to a collection of nodes operating together at a specific depth within the neural network. Outputs travel from the first layer to the last layer. As shown in Figure 3, there is typically an input Layer, one or more middle layers (called hidden layers), and an output Layer.

  • The input layer contains only data. There are not working neurons there.
  • The hidden layer(s) is/are where the learning occurs — later, we will review how.
  • The output layer contains neurons that calculate the final output.
Figure 3. A neural network with two inputs in an input layer, two hidden layers, and one neuron in the output layer.

The number of input and output neurons is dependent upon the problem at hand. The number of hidden neurons is often the sum of input and output neurons, but it is not a rule.

How Do They Work?

Neural networks help us to classify information. They are trained (learn) by processing examples, each containing a known input and output. The goal of the training process is to calculate values for the weights associated with each input in each neuron. Once we train the neural network, i.e., we calculate the weights for all the weights, we can use the neural network to map new unseen inputs to an output.


The Hello-World example for neural networks is usually implementing a neural network to recognize the XOR operator. The neural network for this has

  • two inputs,
  • one output, and
  • we will use one hidden layer with three neurons — as recommended, the sum of input and output neurons.

Our neural network is shown in Figure 4, and the input data that we will use to train the network and the known outputs.

Figure 4. Our neural network for calculating the XOR operator.

The first step with a neural network is to initialize weights. What options do we have?

  • Initialize with zeros only — it would be a poor strategy 😳. Remember, the weights will be multiplied by the inputs, so with wights equal to zero, the inputs no longer play a role, and the neural network cannot learn properly.
  • Initialize weights randomly – it is a bit naïve, but it works nicely very often, except in a few cases. Let’s use this approach for our example.
  • Advanced strategies are available.

Thus, we are going to initialize the nine weight values in our neural network with random values.

It is a fancy name for providing the network with one input and observing the output. We start at the input layer and calculate the outputs for the hidden layer. The results are passed forward to the next layer. Then, we calculate the output in the output layer using the outputs from the hidden layer as inputs. Figure 5 shows the maths. It is just linear algebra. That’s it.

Figure 5. Calculate the outputs for each neuron, starting at the input layer and move forward; use the outputs from the neurons in one layer as the inputs for the neurons in the next one.

The error is calculated as the difference between the known output and the calculated output (output ₃ in our example). Error-values are commonly square to remove negative signs and give more weight to larger differences. A division by two does not affect the calculation and will be helpful later for making the derivative more straightforward.

Figure 6. Error is calculated as the difference between the known output and the calculatedoutput.

If the neural network has more than one node in the output layer, the error is calculated as adding all the partial errors.

Since we are using random values for the weights, our output will probably have a high error. We need to reduce the error. The only way to reduce the error is to change the calculated value. And, the only way to change the calculated value is by modifying the values of the weights. A proper adjustment of weights ensures that the subsequent output will be closer to the expected output. This process is repeated until we are satisfied that the network can produce results significantly close enough to the known output.

How to modify the value of the weights so that the error is reduced?

Short answer: use the gradient descent algorithm. It was first suggested in 1847. It applies multivariable calculus, specifically partial derivatives. The derivative of the error function with respect to each weight is used to adjust the values of the weights. The derivative of the error function can be multiplied by a selected number (called learning rate) to make sure that the new updated weight is minimizing the error function. The learning rate is a small positive value, often in the range between 0.0 and 1.0.

To calculate the partial derivatives with respect to the weights, we need the derivative of the error function and the derivative of the sigmoid function. Figure 7 shows the general equation for the weights update and one example solving the equation for the weight W₆ — the weight of the first input for the neuron in the output layer.

The calculus chain rule principle is applied to compute the derivative of the composite function. Be aware that calculations are similar but not the same for neurons in the output layer and neurons in the hidden layer.

Figure 7. Equations to: update the weight values (in red), error and derivative of the error (in gray), and calculated output (sigmoid function) derivative.

So, we start with random weight values, then:

  • we calculate outputs for all neurons using the math in Figure 5 (forward propagation) and the difference between the calculated output and the known output (error).
  • If the difference is greater than what we expected, we calculate new weight values (backward propagation).

These two activities repeat until we reduce the error to an acceptable value. An acceptable error could be anywhere between 0 and 0.05.

Coding the Example

Let us see how the four steps described above look in code. We are going to implement a simple neural network in Java. I do not want to reinvent the wheel; just show the nuts and bolts to understand how things work.

First, the attributes:

  • A constant value to define the learning rate that we will be using;
  • Three variables to store the total number of nodes that we will have in each layer — we will create a neural network with two nodes in the input layer, three in a hidden layer, and one in the output layer.
  • Three arrays to store weights values, bias values, and the output of each neuron.

We will create a neural network with six nodes, and we will need nine weights and four bias values for the hidden and output layer nodes.

Figure 8. — attributes in the class

We can use a constructor to initialize the arrays and put initial values in weights and bias. Remember that, initially, they are just random values. Lines 11 and 13 do the initialization.

Figure 9. — constructor

We need to solve the equations shown in Figure 5. Thus, let us create a method for that. Notice that the inputs are handled as nodes (in the input layer), but they do not calculate output values for these. We calculate outputs for the nodes in the hidden layer and nodes in the output layer. The output is calculated by multiplying weight values times the input value, summing them all, and applying the activation function. We use sigmoid as the activation function, and we create a sigmoid method just to keep the separation of concerns. Noting complex here, basically an implementation of the linear algebra described in Figure 5. We will run this for every single set of input values, thus, it will run 4 times with {0,0}, {0,1}, {1,0}, and {1,1}

Figure 10. — forward propagation, i.e., calculate the output for each neuron in hidden and output layer

In our example, with only one neuron in the output layer, the error calculation is pretty straightforward. But, let us generalize the idea in our code by creating an implementation that can be used with one or more neurons in the output layer. This implementation is shown in Figure 11.

Figure 11. — error calculation

Finally, let us create the learning part —a method that implements the math responsible for updating the values for the weights. The multivariable calculus lives there. This method is run for every single set of known output values, therefore, it will run 4 times with {0.0}, {1.0}, {1.0}, and {0.0}.

Figure 12. — backward propagation, i.e., calculate new values for weights and bias value in all neurons

We have all the parts; it is time to put them together and run our implementation. Take a look at the main() method for our class, as a summary:

  • Training data (input and known output) are represented in two arrays.
  • A neural network object is created with two inputs, three nodes in a hidden layer, and one node in the output layer.
  • Forward propagation, error calculation, and backward propagation are run 10,000 times.
Figure 13. — main method

Finally, let us try our neural network. After 10,000 iterations, our neural network is alive and working with acceptable performance. Figure 14 shows how the error rate decrease. The X-axis represents the iteration number (0 to 10,000), and the Y-axis is the mean square error as calculated in lines 18 and 23 of the main() method shown in Figure 13.

Figure 14. Errors per iteration. A total of 10,000 iterations are run. Error drop from 0.4242 to 0.0116

Not bad for ~100 lines of code (you can download the complete source code from my GitHub repository). However, we could have done the same with ~10 lines of code using a library. One of such libraries is Eclipse Deeplearning4j, an open-source, distributed deep-learning library written for Java. We can use a library and solve more complex problems, such as train a neural network for image classification. Inputs will increase, the training data set will be much more significant (than our four lines for XOR), and we would need more than one hidden layer. But that is another story. Thanks for reading. Feel free to leave your feedback and reviews below.


Do you want to learn more about the details? Review here the derivative of the sigmoid function; review here the chain rule in calculus; review here the gradient descendant definition; and here a detailed description of the maths behind backward propagation.



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Javier Gonzalez

Software Engineer. Teaching Professor. Intelligent Systems, Emotion AI, CS Education. ACM Distinguished Speaker