Image designed in photoshop representing a neural network and brain. The open source image can be found for free on Pixabay
Image designed in photoshop representing a neural network and brain. The open source image can be found for free on Pixabay
Source: Pixabay

Machine Learning, Scholarly, Tutorial

Building Neural Networks with Python Code and Math in Detail — II

The second part of our tutorial on neural networks from scratch. From the math behind them to step-by-step implementation case studies in Python. Launch the samples on Google Colab.

Towards AI Team
Jun 30 · 21 min read

Author(s): Pratik Shukla, Roberto Iriondo

Last updated, June 30, 2020

In the first part of our tutorial on neural networks, we explained the basic concepts about neural networks, from the math behind them to implementing neural networks in Python without any hidden layers. We showed how to make satisfactory predictions even in case scenarios where we did not use any hidden layers. However, there are several limitations to single-layer neural networks.

In this tutorial, we will dive in-depth on the limitations and advantages of using neural networks in machine learning. We will show how to implement neural nets with hidden layers and how these lead to a higher accuracy rate on our predictions, along with implementation samples in Python on Google Colab.

Index:

  1. How to select several neurons in a hidden layer.
  2. The general structure of an artificial neural network (ANN).
  3. Implementation of a multilayer neural network in Python.
  4. Comparison with a single-layer neural network.
  5. Non-linearly separable data with a neural network.
  6. Conclusion.

1. Limitations and Advantages of Neural Networks

Limitations of single-layer neural networks:

  • They can only predict linearly separable data. If we have non-linear data, then training our single-layer neural network will lead to low accuracy in our prediction rate.
  • Decision boundaries for single-layer neural networks must be in the hyperplane, which means that if our data distributes in 3 dimensions, then our decision boundary must be in 2 dimensions.
Figure 0: An example of non-linearly separable data
Figure 0: An example of non-linearly separable data
Figure 0: An example of non-linearly separable data.

To overcome such limitations, we use hidden layers in our neural networks.

Advantages of single-layer neural networks:

  • Single-layer neural networks take less time to train compared to a multi-layer neural network.
  • Single-layer neural networks have explicit links to statistical models.
  • The outputs in single layer neural networks are weighted sums of inputs. It means that we can interpret the output of a single layer neural network feasibly.

Advantages of multilayer neural networks:

  • They can be used to classify non-linearly separable data.
  • Multilayer neural networks are more reliable compared to single-layer neural networks.

2. How to select several neurons in a hidden layer?

  • The number of hidden nodes should be less than twice the size of the nodes in the input layer.

For example: If we have 2 input nodes, then our hidden nodes should be less than 4.

a. 2 inputs, 4 hidden nodes:

Figure 1: A neural net with 2 inputs, and 4 hidden nodes
Figure 1: A neural net with 2 inputs, and 4 hidden nodes
Figure 1: A neural net with 2 inputs, and 4 hidden nodes.

b. 2 inputs, 3 hidden nodes:

Figure 2: A neural net with 2 inputs, and 3 hidden nodes.
Figure 2: A neural net with 2 inputs, and 3 hidden nodes.
Figure 2: A neural net with 2 inputs, and 3 hidden nodes.

c. 2 inputs, 2 hidden nodes:

Figure 3: A neural network with 2 inputs, and 2 hidden nodes.
Figure 3: A neural network with 2 inputs, and 2 hidden nodes.
Figure 3: A neural network with 2 inputs, and 2 hidden nodes.

d. 2 inputs, 1 hidden node:

Figure 4: A neural net with 2 inputs, and 1 hidden node.
Figure 4: A neural net with 2 inputs, and 1 hidden node.
Figure 4: A neural net with 2 inputs, and 1 hidden node.
  • The number of hidden nodes should be 2/3 the size of input nodes, plus the size of the output node.

For example: If we have 2 input nodes and 1 output node then the hidden nodes should be = floor(2*2/3 + 1) = 2

a. 2 inputs, 2 hidden nodes:

Figure 5: A neural net with 2 inputs, and 2 hidden nodes.
Figure 5: A neural net with 2 inputs, and 2 hidden nodes.
Figure 5: A neural net with 2 inputs, and 2 hidden nodes.
  • The number of hidden nodes should be between the size of input nodes and output nodes.

For example: If we have 3 input nodes and 2 output nodes, then the hidden nodes should be between 2 and 3.

a. 3 inputs, 2 hidden nodes, 2 outputs:

Figure 6: A neural net with 3 inputs, 2 hidden nodes and 2 outputs.
Figure 6: A neural net with 3 inputs, 2 hidden nodes and 2 outputs.
Figure 6: A neural net with 3 inputs, 2 hidden nodes, and 2 outputs.

b. 3 inputs, 3 hidden nodes, 2 outputs:

Figure 7: A neural net with 3 inputs, 3 hidden nodes, and 2 outputs.
Figure 7: A neural net with 3 inputs, 3 hidden nodes, and 2 outputs.
Figure 7: A neural net with 3 inputs, 3 hidden nodes, and 2 outputs.

How many weight values do we need?

  1. For an output layer: Number of hidden layer nodes * No. of outputs

3. The General Structure of an Artificial Neural Network (ANN):

Figure 8: General structure for an artificial neural network with three layers, an input layer, a hidden layer, and an output
Figure 8: General structure for an artificial neural network with three layers, an input layer, a hidden layer, and an output
Figure 8: General structure for an artificial neural network with three layers, an input layer, a hidden layer, and an output layer.

Summarization of an artificial neural network:

  1. Add bias (if required).
  2. Assign random weights in the hidden layer and the output layer.
  3. Run the code for training.
  4. Find the error in prediction.
  5. Update the weight values of the hidden layer and output layer by gradient descent algorithm.
  6. Repeat the training phase with updated weights.
  7. Make predictions.

Execution of multilayer neural networks:

Execution in phase-1:

We first define some terms we are going to use in these derivatives:

Figure 9: Defining our derivatives
Figure 9: Defining our derivatives
Figure 9: Defining our derivatives.

a. Finding the first derivative:

Figure 10: Finding the first derivative
Figure 10: Finding the first derivative
Figure 10: Finding the first derivative.

b. Finding the second derivative:

Figure 11: Finding the second derivative.
Figure 11: Finding the second derivative.
Figure 11: Finding the second derivative.

c. Finding the third derivative:

Figure 12: Finding the third derivative.
Figure 12: Finding the third derivative.
Figure 12: Finding the third derivative.

Notice that we already derived these derivatives in the first part of our tutorial.

Execution in phase-2:

Represented as:

Figure 13: Finding the updated weights for the hidden layer
Figure 13: Finding the updated weights for the hidden layer
Figure 13: Finding the updated weights for the hidden layer.

a. Finding the first derivative:

Here we are going to use the chain rule to find the derivative.

Figure 14: Finding the first derivative
Figure 14: Finding the first derivative
Figure 14: Finding the first derivative.

Using the chain rule again.

Figure 15: Applying a change rule once again.
Figure 15: Applying a change rule once again.
Figure 15: Applying a change rule once again.

The step below is similar to what we did in the first part of our tutorial on neural networks.

Image for post
Image for post
Figure 16: Expanding our result for the first derivative, resulting in the output weight.

b. Finding the second derivative:

Image for post
Image for post
Figure 17: Finding the second derivative.

c. Finding the third derivative:

Image for post
Image for post
Figure 18: Finding the third derivative.

4. Implementation of a multilayer neural network in Python

📚 Multilayer neural network: A neural network with a hidden layer 📚 For more definitions, check out our article in terminology in machine learning.

Below we are going to implement the “OR” gate without the bias value. In conclusion, adding hidden layers in a neural network helps us achieve higher accuracy in our models.

Representation:

Figure 19: The OR Gate
Figure 19: The OR Gate
Figure 19: The OR Gate.

Truth-Table:

Figure 20: Input features.
Figure 20: Input features.
Figure 20: Input features.

Neural Network:

Figure 21: Neural network.
Figure 21: Neural network.
Figure 21: Neural network.

Graphical representation:

Figure 22: Inputs on the graph, notice that the same color dots have the same output.
Figure 22: Inputs on the graph, notice that the same color dots have the same output.
Figure 22: Inputs on the graph, notice that the same color dots have the same output.

Implementation in Python:

a. Import required libraries:

Figure 23: Importing NumPy
Figure 23: Importing NumPy
Figure 23: Importing NumPy.

b. Define input features:

Next, we take input values for which we want to train our neural network. We can see that we have taken two input features. On tangible data sets, the value of input features is mostly high.

Figure 24: Assigning input values to train our neural net.
Figure 24: Assigning input values to train our neural net.
Figure 24: Assigning input values to train our neural net.

c. Define target output values:

For the input features, we want to have a specific output for specific input features. It is called the target output. We are going to train the model that gives us the target output for our input features.

Figure 25: Defining our target output, and reshaping our target output into a vector
Figure 25: Defining our target output, and reshaping our target output into a vector
Figure 25: Defining our target output, and reshaping our target output into a vector

d. Assign random weights:

Next, we are going to assign random weights to the input features. Note that our model is going to modify these weight values to be optimal. At this point, we are taking these values randomly. Here we have two layers, so we have to assign weights for them separately.

The other variable is the learning rate. We are going to use the learning rate (LR) in a gradient descent algorithm to update the weight values. Generally, we keep LR as low as possible so that we can achieve a minimal error rate.

Figure 26: Defining the weights for our neural net, along with our learning rate (LR)
Figure 26: Defining the weights for our neural net, along with our learning rate (LR)
Figure 26: Defining the weights for our neural net, along with our learning rate (LR)

e. Sigmoid function:

Once we have our weight values and input features, we are going to send it to the main function that predicts the output. Notice that our input features and weight values can be anything, but here we want to classify data, so we need the output between 0 and 1. For such output, we are going to use a sigmoid function.

Figure 27: Applying our sigmoid function.
Figure 27: Applying our sigmoid function.
Figure 27: Applying our sigmoid function.

f. Sigmoid function derivative:

In a gradient descent algorithm, we need the derivative of the sigmoid function.

Figure 28: Applying a derivation to our sigmoid function.
Figure 28: Applying a derivation to our sigmoid function.
Figure 28: Applying a derivation to our sigmoid function.

g. The main logic for predicting output and updating the weight values:

We are going to understand the following code step-by-step.

Figure 29: Phase 1 of training on our neural network.
Figure 29: Phase 1 of training on our neural network.
Figure 29: Phase 1 of training on our neural network.
Figure 30: Phase 2 of training on our neural network.
Figure 30: Phase 2 of training on our neural network.
Figure 30: Phase 2 of training on our neural network.

How does it work?

b. Next, we find the input for the hidden layer. Defined by the following formula:

Figure 31: Finding the input for our neural network’s hidden layer.
Figure 31: Finding the input for our neural network’s hidden layer.
Figure 31: Finding the input for our neural network’s hidden layer.

We can also represent it as matrices to understand in a better way.

The first matrix here is input features with size (4*2), and the second matrix is weight values for a hidden layer with size (2*3). So the resultant matrix will be of size (4*3).

The intuition behind the final matrix size:

The row size of the final matrix is the same as the row size of the first matrix, and the column size of the final matrix is the same as the column size of the second matrix in multiplication (dot product).

In the representation below, each of those boxes represents a value.

Figure 32: Matrix value representation.
Figure 32: Matrix value representation.
Figure 32: Matrix value representation.

c. Afterward, we have an input for the hidden layer, and it is going to calculate the output by applying a sigmoid function. Below is the output of the hidden layer:

Figure 33: Output of our hidden layer.
Figure 33: Output of our hidden layer.
Figure 33: Output of our hidden layer.

d. Next, we multiply the output of the hidden layer with the weight of the output layer:

Figure 34: Formula representing the output of our hidden layer, with the weight of the output layer.
Figure 34: Formula representing the output of our hidden layer, with the weight of the output layer.
Figure 34: Formula representing the output of our hidden layer, with the weight of the output layer.

The first matrix shows the output of the hidden layer, which has a size of (4*3). The second matrix represents the weight values of the output layer,

Figure 35: Representation of the hidden layer, and our output layer.
Figure 35: Representation of the hidden layer, and our output layer.
Figure 35: Representation of the hidden layer, and our output layer.

e. Afterward, we calculate the output of the output layer by applying a sigmoid function. It can also be represented in matrix form as follows.

Figure 36: Output of our layer, after a sigmoid function.
Figure 36: Output of our layer, after a sigmoid function.
Figure 36: Output of our layer, after a sigmoid function.

f. Now that we have our predicted output, we find the mean squared between target output and predicted output.

Figure 37: Finding the mean between our target output and our predicted output.
Figure 37: Finding the mean between our target output and our predicted output.
Figure 37: Finding the mean between our target output and our predicted output.

g. Next, we begin the first phase of training. In this step, we update the weight values for the output layer. We need to find out how much the output weights affect the error value. To update the weights, we use a gradient descent algorithm. Notice that we have already found the derivatives we will use during the training phase.

Figure 38: Updating the weight values for our output layer.
Figure 38: Updating the weight values for our output layer.
Figure 38: Updating the weight values for our output layer.

g.a. Matrix representation of the first derivative. Matrix size (4*1).

derror_douto = output_op -target_output

Figure 39: First derivative matrix representation.
Figure 39: First derivative matrix representation.
Figure 39: First derivative matrix representation.

g.b. Matrix representation of the second derivative. Matrix size (4*1).

dout_dino = sigmoid_der(input_op)

Figure 40: Second derivative matrix representation.
Figure 40: Second derivative matrix representation.
Figure 40: Second derivative matrix representation.

g.c. Matrix representation of the third derivative. Matrix size (4*3).

dino_dwo = output_hidden

Figure 41: Third derivative matrix representation.
Figure 41: Third derivative matrix representation.
Figure 41: Third derivative matrix representation.

g.d. Matrix representation of transpose of dino_dwo. Matrix size (3*4).

Figure 42: Matrix representation of our variable dino_dwo, see implementation for details.
Figure 42: Matrix representation of our variable dino_dwo, see implementation for details.
Figure 42: Matrix representation of our variable dino_dwo, see the implementation for details.

g.e. Now, we are going to find the final matrix of output weight. For a detailed explanation of this step, please check out our previous tutorial. The matrix size will be (3*1), which is the same as the output_weight matrix.

Figure 43: Final matrix of the output weight.
Figure 43: Final matrix of the output weight.
Figure 43: Final matrix of the output weight.

Hence, we have successfully find the derivative values. Next, we update the weight values accordingly with the help of a gradient descent algorithm.

Nonetheless, we also have to find the derivative for phase-2. Let’s first find that, and then we will update the weights for both layers in the end.

h. Phase -2. Updating the weights in the hidden layer.

Since we have already discussed how we derived the derivative values, we are just going to see matrix representation for each of them to understand it better. Our goal here is to find the weight matrix for the hidden layer, which is of size (2*3).

h.a. Matrix representation for the first derivative.

derror_dino = derror_douto * douto_dino

Figure 44: Matrix representation of the first derivative.
Figure 44: Matrix representation of the first derivative.
Figure 44: Matrix representation of the first derivative.

h.b. Matrix representation for the second derivative.

dino_douth = weight_output

Figure 45: Matrix representation of the second derivative.
Figure 45: Matrix representation of the second derivative.
Figure 45: Matrix representation of the second derivative.

h.c. Matrix representation for the third derivative.

derror_douth = np.dot(derror_dino , dino_douth.T)

Figure 46: Matrix representation of the third derivative.
Figure 46: Matrix representation of the third derivative.
Figure 46: Matrix representation of the third derivative.

h.d. Matrix representation for the fourth derivative.

douth_dinh = sigmoid_der(input_hidden)

Figure 47: Matrix representation of the fourth derivative.
Figure 47: Matrix representation of the fourth derivative.
Figure 47: Matrix representation of the fourth derivative.

h.e. Matrix representation for the fifth derivative.

dinh_dwh = input_features

Figure 48: Matrix representation of the fifth derivative.
Figure 48: Matrix representation of the fifth derivative.
Figure 48: Matrix representation of the fifth derivative.

h.f. Matrix representation for the sixth derivative.

derror_dwh = np.dot(dinh_dwh.T, douth_dinh * derror_douth)

Figure 49: Matrix representation of the sixth derivative.
Figure 49: Matrix representation of the sixth derivative.
Figure 49: Matrix representation of the sixth derivative.

Notice that our goal was to find a hidden weight matrix with the size of (2*3). Furthermore, we have successfully managed to find it.

h.g. Updating the weight values :

We will use the gradient descent algorithm to update the values. It takes three parameters.

  1. The original weight: we already have it.
  2. The learning rate (LR): we assigned it the value of 0.05.
  3. The derivative: Found on the previous step.

Gradient descent algorithm:

Figure 50: Formula for a gradient descent algorithm
Figure 50: Formula for a gradient descent algorithm
Figure 50: Formula for a gradient descent algorithm

Since we have all of our parameter values, this will be a straightforward operation. First, we are updating the weight values for the output layer, and then we are updating the weight values for the hidden layer.

i. Final weight values:

Below, we show the updated weight values for both layers — our prediction bases on these values.

Figure 51: Displaying the final hidden layer weight values.
Figure 51: Displaying the final hidden layer weight values.
Figure 51: Displaying the final hidden layer weight values.
Figure 52: Displaying the final output layer weight values.
Figure 52: Displaying the final output layer weight values.
Figure 52: Displaying the final output layer weight values.

j. Making predictions:

j.a. Prediction for (1,1).

Target output = 1

Explanation:

First of all, we are going to take the input values for which we want to predict the output. The “result1” variable stores the value of the dot product of input variables and hidden layer weight. We obtain the output by applying a sigmoid function, the result stores in the result2 variable. Such is the input feature for the output layer. We calculate the input for the output layer by multiplying input features with output layer weight. To find the final output value, we take the sigmoid value of that.

Figure 53: Printing our results for target output = 1.
Figure 53: Printing our results for target output = 1.
Figure 53: Printing our results for target output = 1.

Notice that the predicted output is very close to 1. So we have managed to make accurate predictions.

j.b. Prediction for (0,0).

Target output = 0

Figure 54: Printing our results for target output = 0.
Figure 54: Printing our results for target output = 0.
Figure 54: Printing our results for target output = 0.

Note that the predicted output is very close to 0, which indicates the success rate of our model.

k. Final error value :

After 200,000 iterations, we have our final error value — the lower the error, the higher the accuracy of the model.

Figure 55: Displaying final error value after 200,000 iterations.
Figure 55: Displaying final error value after 200,000 iterations.
Figure 55: Displaying the final error value after 200,000 iterations.

As shown above, we can see that the error value is 0.0000000189. This value is the final error value in prediction after 200,000 iterations.

Putting it all together:

# Import required libraries :
import numpy as np# Define input features :
input_features = np.array([[0,0],[0,1],[1,0],[1,1]])
print (input_features.shape)
print (input_features)# Define target output :
target_output = np.array([[0,1,1,1]])# Reshaping our target output into vector :
target_output = target_output.reshape(4,1)
print(target_output.shape)
print (target_output)# Define weights :
# 6 for hidden layer
# 3 for output layer
# 9 totalweight_hidden = np.array([[0.1,0.2,0.3],
[0.4,0.5,0.6]])
weight_output = np.array([[0.7],[0.8],[0.9]])# Learning Rate :
lr = 0.05# Sigmoid function :
def sigmoid(x):
return 1/(1+np.exp(-x))# Derivative of sigmoid function :
def sigmoid_der(x):
return sigmoid(x)*(1-sigmoid(x))for epoch in range(200000):
# Input for hidden layer :
input_hidden = np.dot(input_features, weight_hidden)

# Output from hidden layer :
output_hidden = sigmoid(input_hidden)

# Input for output layer :
input_op = np.dot(output_hidden, weight_output)

# Output from output layer :
output_op = sigmoid(input_op)#==========================================================
# Phase1

# Calculating Mean Squared Error :
error_out = ((1 / 2) * (np.power((output_op — target_output), 2)))
print(error_out.sum())

# Derivatives for phase 1 :
derror_douto = output_op — target_output
douto_dino = sigmoid_der(input_op)
dino_dwo = output_hiddenderror_dwo = np.dot(dino_dwo.T, derror_douto * douto_dino)#===========================================================
# Phase 2
# derror_w1 = derror_douth * douth_dinh * dinh_dw1
# derror_douth = derror_dino * dino_outh

# Derivatives for phase 2 :
derror_dino = derror_douto * douto_dino
dino_douth = weight_output
derror_douth = np.dot(derror_dino , dino_douth.T)
douth_dinh = sigmoid_der(input_hidden)
dinh_dwh = input_features
derror_wh = np.dot(dinh_dwh.T, douth_dinh * derror_douth)# Update Weights
weight_hidden -= lr * derror_wh
weight_output -= lr * derror_dwo

# Final hidden layer weight values :
print (weight_hidden)# Final output layer weight values :
print (weight_output)# Predictions :#Taking inputs :
single_point = np.array([1,1])
#1st step :
result1 = np.dot(single_point, weight_hidden)
#2nd step :
result2 = sigmoid(result1)
#3rd step :
result3 = np.dot(result2,weight_output)
#4th step :
result4 = sigmoid(result3)
print(result4)#=================================================
#Taking inputs :
single_point = np.array([0,0])
#1st step :
result1 = np.dot(single_point, weight_hidden)
#2nd step :
result2 = sigmoid(result1)
#3rd step :
result3 = np.dot(result2,weight_output)
#4th step :
result4 = sigmoid(result3)
print(result4)#=====================================================
#Taking inputs :
single_point = np.array([1,0])
#1st step :
result1 = np.dot(single_point, weight_hidden)
#2nd step :
result2 = sigmoid(result1)
#3rd step :
result3 = np.dot(result2,weight_output)
#4th step :
result4 = sigmoid(result3)
print(result4)

Below, notice that the data we used in this example was linearly separable, which means that by a single line, we can classify outputs with 1 value and outputs with 0 values.

Figure 56: Graph showing data being linearly separable, allowing to classify outputs with 1 value or 0 values
Figure 56: Graph showing data being linearly separable, allowing to classify outputs with 1 value or 0 values
Figure 56: Graph showing data being linearly separable, allowing to classify outputs with 1 value or 0 values.

Launch it on Google Colab:


5. Comparison with a single-layer neural network

The final error value for the following code is:

Figure 57: Displaying final error value.
Figure 57: Displaying final error value.
Figure 57: Displaying the final error value.

As we can see, the error value is way too high compared to the error we found in our neural network implementation with hidden layers, making it one of the main reasons to use hidden layers in a neural network.

# Import required libraries :
import numpy as np# Define input features :
input_features = np.array([[0,0],[0,1],[1,0],[1,1]])
print (input_features.shape)
print (input_features)# Define target output :
target_output = np.array([[0,1,1,1]])# Reshaping our target output into vector :
target_output = target_output.reshape(4,1)
print(target_output.shape)
print (target_output)# Define weights :
weights = np.array([[0.1],[0.2]])
print(weights.shape)
print (weights)# Define learning rate :
lr = 0.05# Sigmoid function :
def sigmoid(x):
return 1/(1+np.exp(-x))# Derivative of sigmoid function :
def sigmoid_der(x):
return sigmoid(x)*(1-sigmoid(x))# Main logic for neural network :
# Running our code 10000 times :for epoch in range(10000):
inputs = input_features#Feedforward input :
pred_in = np.dot(inputs, weights)#Feedforward output :
pred_out = sigmoid(pred_in)#Backpropogation
#Calculating error
error = pred_out - target_output
x = error.sum()

#Going with the formula :
print(x)

#Calculating derivative :
dcost_dpred = error
dpred_dz = sigmoid_der(pred_out)

#Multiplying individual derivatives :
z_delta = dcost_dpred * dpred_dz#Multiplying with the 3rd individual derivative :
inputs = input_features.T
weights -= lr * np.dot(inputs, z_delta)#Predictions :#Taking inputs :
single_point = np.array([1,0])
#1st step :
result1 = np.dot(single_point, weights)
#2nd step :
result2 = sigmoid(result1)
#Print final result
print(result2)#====================================
#Taking inputs :
single_point = np.array([0,0])
#1st step :
result1 = np.dot(single_point, weights)
#2nd step :
result2 = sigmoid(result1)
#Print final result
print(result2)#===================================
#Taking inputs :
single_point = np.array([1,1])
#1st step :
result1 = np.dot(single_point, weights)
#2nd step :
result2 = sigmoid(result1)
#Print final result
print(result2)

Launch it on Google Colab:


6. Non-linearly separable data with a neural network

Input Table:

Figure 58: Input features
Figure 58: Input features
Figure 58: Input features.

Graphical Representation Of Data Points :

Figure 59: Coordinate plane with input points
Figure 59: Coordinate plane with input points
Figure 59: Coordinate plane with input points.

As figure 59 shows, we have 2 inputs and 1 output. In this example, we are going to use 4 hidden perceptrons. The red dots have an output value of 0, and the black dots have an output value of 1. Therefore, we cannot simply classify them using a single straight line.

Neural Network:

Figure 60: An artificial neural network
Figure 60: An artificial neural network
Figure 60: An artificial neural network.

Implementation in Python:

Figure 61: Importing NumPy with Python.
Figure 61: Importing NumPy with Python.
Figure 61: Importing NumPy with Python.

b. Define input features:

Figure 62: Defining our input features.
Figure 62: Defining our input features.
Figure 62: Defining our input features.

c. Define the target output:

Figure 63: Defining our target output.
Figure 63: Defining our target output.
Figure 63: Defining our target output.

d. Assign random weight values:

On figure 64, notice that we are using NumPy’s library random function to generate random values.

numpy.random.rand(x,y): Here x is the number of rows, and y is the number of columns. It generates output values over [0,1). It means 0 is included, but 1 is not included in the value generation.

Figure 64: Generating random values with NumPy’s library np.random.rand
Figure 64: Generating random values with NumPy’s library np.random.rand
Figure 64: Generating random values with NumPy’s library np.random.rand

e. Sigmoid function:

Figure 65: Defining our sigmoid function
Figure 65: Defining our sigmoid function
Figure 65: Defining our sigmoid function

f. Finding the derivative with a sigmoid function:

Figure 66: Finding the derivative of our sigmoid function
Figure 66: Finding the derivative of our sigmoid function
Figure 66: Finding the derivative of our sigmoid function

g. Training our neural network:

Figure 67: Phase 1 of training on our neural net
Figure 67: Phase 1 of training on our neural net
Figure 67: Phase 1 of training on our neural net
Figure 68: Phase two of training on our neural network
Figure 68: Phase two of training on our neural network
Figure 68: Phase two of training on our neural network

h. Weight values of hidden layer:

Figure 69: Displaying the final values of our weights in the hidden layer.
Figure 69: Displaying the final values of our weights in the hidden layer.
Figure 69: Displaying the final values of our weights in the hidden layer.

i. Weight values of output layer:

Figure 70: Displaying the final weight values for our output layers.
Figure 70: Displaying the final weight values for our output layers.
Figure 70: Displaying the final weight values for our output layers.

j. Final error value :

After training our model for 200,000 iterations, we finally achieved a low error value.

Figure 71: Low error value of the model trained during 200,000 iterations
Figure 71: Low error value of the model trained during 200,000 iterations
Figure 71: Low error value of the model trained during 200,000 iterations

k. Making predictions from the trained model :

k.a. Predicting output for (0.5, 2).

Figure 72: Predicting our results for (0.5, 2).
Figure 72: Predicting our results for (0.5, 2).
Figure 72: Predicting our results for (0.5, 2).

The predicted output is closer to 1.

k.b. Predicting output for (0, -1)

Figure 73: Predicting our results for (0, -1)
Figure 73: Predicting our results for (0, -1)
Figure 73: Predicting our results for (0, -1)

The predicted output is very near to 0.

k.c. Predicting output for (0, 5)

Figure 74: Predicting our results for (0, 5).
Figure 74: Predicting our results for (0, 5).
Figure 74: Predicting our results for (0, 5).

The predicted output is close to 1.

k.d. Predicting output for (1, 1.2)

Figure 75: Predicting our results for (1, 1.2).
Figure 75: Predicting our results for (1, 1.2).
Figure 75: Predicting our results for (1, 1.2).

The predicted output is close to 0.

Based on the output values, our model has done a high-grade job of predicting values.

We can separate our data in the following way as shown in Figure 76. Note that this is not the only possible way to separate these values.

Figure 76: Possible ways of separating our values.
Figure 76: Possible ways of separating our values.
Figure 76: Possible ways of separating our values.

Therefore to conclude, using a hidden layer on our neural networks helps us reducing the error rate when we have non-linearly separable data. Even though the training time extends, we have to remember that our goal is to make high accuracy predictions, and such will be satisfied.

Putting it all together:

# Import required libraries :
import numpy as np# Define input features :
input_features = np.array([[0,0],[0,1],[1,0],[1,1]])
print (input_features.shape)
print (input_features)# Define target output :
target_output = np.array([[0,1,1,0]])# Reshaping our target output into vector :
target_output = target_output.reshape(4,1)
print(target_output.shape)
print (target_output)# Define weights :
# 8 for hidden layer
# 4 for output layer
# 12 total
weight_hidden = np.random.rand(2,4)
weight_output = np.random.rand(4,1)# Learning Rate :
lr = 0.05# Sigmoid function :
def sigmoid(x):
return 1/(1+np.exp(-x))# Derivative of sigmoid function :
def sigmoid_der(x):
return sigmoid(x)*(1-sigmoid(x))# Main logic :
for epoch in range(200000):
# Input for hidden layer :
input_hidden = np.dot(input_features, weight_hidden)

# Output from hidden layer :
output_hidden = sigmoid(input_hidden)

# Input for output layer :
input_op = np.dot(output_hidden, weight_output)

# Output from output layer :
output_op = sigmoid(input_op)#========================================================================
# Phase1

# Calculating Mean Squared Error :
error_out = ((1 / 2) * (np.power((output_op — target_output), 2)))
print(error_out.sum())


# Derivatives for phase 1 :
derror_douto = output_op — target_output
douto_dino = sigmoid_der(input_op)
dino_dwo = output_hiddenderror_dwo = np.dot(dino_dwo.T, derror_douto * douto_dino)# ========================================================================
# Phase 2# derror_w1 = derror_douth * douth_dinh * dinh_dw1
# derror_douth = derror_dino * dino_outh

# Derivatives for phase 2 :
derror_dino = derror_douto * douto_dino
dino_douth = weight_output
derror_douth = np.dot(derror_dino , dino_douth.T)
douth_dinh = sigmoid_der(input_hidden)
dinh_dwh = input_features
derror_dwh = np.dot(dinh_dwh.T, douth_dinh * derror_douth)# Update Weights
weight_hidden -= lr * derror_dwh
weight_output -= lr * derror_dwo


# Final values of weight in hidden layer :
print (weight_hidden)# Final values of weight in output layer :
print (weight_output)#Taking inputs :
single_point = np.array([0,-1])
#1st step :
result1 = np.dot(single_point, weight_hidden)
#2nd step :
result2 = sigmoid(result1)
#3rd step :
result3 = np.dot(result2,weight_output)
#4th step :
result4 = sigmoid(result3)
print(result4)#Taking inputs :
single_point = np.array([0,5])
#1st step :
result1 = np.dot(single_point, weight_hidden)
#2nd step :
result2 = sigmoid(result1)
#3rd step :
result3 = np.dot(result2,weight_output)
#4th step :
result4 = sigmoid(result3)
print(result4)#Taking inputs :
single_point = np.array([1,1.2])
#1st step :
result1 = np.dot(single_point, weight_hidden)
#2nd step :
result2 = sigmoid(result1)
#3rd step :
result3 = np.dot(result2,weight_output)
#4th step :
result4 = sigmoid(result3)
print(result4)

Launch it on Google Colab:


7. Conclusion

  • Inputs store in its networks instead of a database.
  • These networks can learn from examples, and we can predict the output for similar events.
  • In case of failure of one neuron, the network can detect the fault and still produce output.
  • Neural networks can perform multiple tasks in parallel processes.

DISCLAIMER: The views expressed in this article are those of the author(s) and do not represent the views of Carnegie Mellon University, nor other companies (directly or indirectly) associated with the author(s). These writings do not intend to be final products, yet rather a reflection of current thinking, along with being a catalyst for discussion and improvement.

Published via Towards AI

Citation

Shukla, et al., “Building Neural Networks with Python Code and Math in Detail — II”, Towards AI, 2020

BibTex citation:

@article{pratik_iriondo_2020, 
title={Building Neural Networks with Python Code and Math in Detail — II},
url={https://towardsai.net/building-neural-nets-with-python},
journal={Towards AI},
publisher={Towards AI Co.},
author={Pratik, Shukla and Iriondo, Roberto},
year={2020},
month={Jun}
}

📚 Are you new to machine learning? Check out an overview of machine learning algorithms for beginners with code examples in Python 📚

References:

[2] Neural Networks from Scratch with Python Code and Math in Detail — I, Pratik Shukla, Roberto Iriondo, https://towardsai.net/neural-networks-with-python

Towards AI — Multidisciplinary Science Journal

The Best of Tech, Science and the Future.

Sign up for Towards AI Newsletter

By Towards AI — Multidisciplinary Science Journal

Towards AI publishes the best of tech, science, and engineering. Subscribe with us to receive our newsletter right on your inbox. Learn more

Towards AI Team

Written by

Publishing the Best of Tech, Science, and The Future | For Authors → https://towardsai.net/contribute | Subscribe→ https://towardsai.net/subscribe — @Towards_AI

Towards AI — Multidisciplinary Science Journal

Towards AI is a world’s leading multidisciplinary science journal. Towards AI publishes the best of tech, science, and the future. Read by thought-leaders and decision-makers around the world.

Towards AI Team

Written by

Publishing the Best of Tech, Science, and The Future | For Authors → https://towardsai.net/contribute | Subscribe→ https://towardsai.net/subscribe — @Towards_AI

Towards AI — Multidisciplinary Science Journal

Towards AI is a world’s leading multidisciplinary science journal. Towards AI publishes the best of tech, science, and the future. Read by thought-leaders and decision-makers around the world.

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store