What is Neural Network ?
Neural networks, also known as artificial neural networks (ANNs), are a type of machine learning that power deep learning algorithms. Their name and form are inspired by the human brain, and they replicate the way biological neurons communicate with one another.In order to learn and gain accuracy over time, neural networks rely on training data. However, these learning algorithms become effective tools in computer science and artificial intelligence.
Let’s explore, how neural network works.
Consider every node to be a separate linear regression model, with input data(X), weights(w), a bias(b), and an output(Z). The formula should look like this:
is the method by which a neural network produces an output for a specific input. The final layer’s output is also referred to as the neural network’s prediction. We will go through how we evaluate predictions later on in this article. These assessments can be used to determine whether or not our neural network needs to be improved.
We compute the cost function immediately following the output generation by the final layer. The cost function calculates how far off from the desired predictions our neural network is. The cost function’s value demonstrates the discrepancy between the true value and the forecasted value.
Here is one of the cost function:
Here , 1/m scales the loss results, yi represents the actual ouput and log(yhat) represents the predicted output.The most interesting thing about this loss function is the negative sign at the starting, it just tradeoff the logarithmic negative output as log(close to 0) gives negative value.
BackPropagation
As a machine-learning algorithm, backpropagation performs a backward pass to adjust a neural network model’s parameters, aiming to minimize the Loss.
Let’s see how it compute gradients
Gradient Descent
The parameters of the neural network are updated using gradient descent This algorithm modify the weights and biases of each layer in the network based on how the minimization of the cost function will be impacted. Backpropagation is used to calculate the impact on the weights and biases of each input neuron in the network on the minimization of the cost function.
Derivative of loss function with respect to (w,b) and Compute gradients for loss optimization.Here J is loss function and gradient w represents the direction in space. The goal here is to reach the global minima.
Update gradients
Here alpha is the learning rate ,w is old weight and dl/dw is derivative loss with respect to weight.For new bias we did the same.
Let’s implement these ideas into code
Initialize parameter for each layer of the network.You have the flexibility to initialize weight in different kinds of distribution for ex: uniform,random normal distribution or any other distribution if you want.Here i use numpy randn which is basically generate standard normal distribution.
def initialize_parameters(self):
np.random.seed(42)
for l in range(1, len(self.layers_size)):
self.parameters["W" + str(l)] = np.random.randn(self.layers_size[l], self.layers_size[l - 1]) / np.sqrt(
self.layers_size[l - 1])
self.parameters["b" + str(l)] = np.zeros((self.layers_size[l], 1))
Forward Propagation
def forward(self, X):
dict = {}
A = X.T
for l in range(self.L - 1):
Z = self.parameters["W" + str(l + 1)].dot(A) + self.parameters["b" + str(l + 1)]
A = self.sigmoid(Z)
dict["A" + str(l + 1)] = A
dict["W" + str(l + 1)] = self.parameters["W" + str(l + 1)]
dict["Z" + str(l + 1)] = Z
Z = self.parameters["W" + str(self.L)].dot(A) + self.parameters["b" + str(self.L)]
A = self.softmax(Z)
dict["A" + str(self.L)] = A
dict["W" + str(self.L)] = self.parameters["W" + str(self.L)]
dict["Z" + str(self.L)] = Z
return A, dict
Backpropagation
def backward(self, X, Y, dict):
derivatives = {}
dict["A0"] = X.T
A = store["A" + str(self.L)]
dZ = A - Y.T
dW = dZ.dot(dict["A" + str(self.L - 1)].T) / self.batch
db = np.sum(dZ, axis=1, keepdims=True) / self.batch
dAPrev = dict["W" + str(self.L)].T.dot(dZ)
derivatives["dW" + str(self.L)] = dW
derivatives["db" + str(self.L)] = db
for l in range(self.L - 1, 0, -1):
dZ = dAPrev * self.sigmoid_derivative(dict["Z" + str(l)])
dW = 1. / self.batch * dZ.dot(dict["A" + str(l - 1)].T)
db = 1. / self.batch * np.sum(dZ, axis=1, keepdims=True)
if l > 1:
dAPrev = dict["W" + str(l)].T.dot(dZ)
derivatives["dW" + str(l)] = dW
derivatives["db" + str(l)] = db
return derivatives
Update gradients with mini batch
The mini-batch is a fixed number of training examples that is less than the actual dataset. So, in each iteration, we train the network on a different group of samples until all samples of the dataset are used.You have the flexibility to choose the number of batch size. In theory it’s prudent to choose any number power of base 2.
def fit(self, X, Y, learning_rate=1, n_iterations=10,batch=32):
np.random.seed(1)
self.batch = batch
for loop in range(n_iterations):
mini_batches = self.create_mini_batches(X, Y, self.batch)
loss = 0
acc = 0
for mini_batch in mini_batches:
X_mini, y_mini = mini_batch
A, store = self.forward(X_mini)
loss += -1*np.mean(y_mini * np.log(A.T+ 1e-8))# CCE cost function A.T is updated weight
derivatives = self.backward(X_mini, y_mini, store)
for l in range(1, self.L + 1):
self.parameters["W" + str(l)] = self.parameters["W" + str(l)] - learning_rate * derivatives[
"dW" + str(l)]
self.parameters["b" + str(l)] = self.parameters["b" + str(l)] - learning_rate * derivatives[
"db" + str(l)]
acc += self.predict(X_mini, y_mini)
self.costs.append(loss)
print("Epoch",loop+1,"\steps ",len(mini_batches),"Train loss: ", "{:.4f}".format(loss/len(mini_batches)),
"Train acc:", "{:.4f}".format(acc/len(mini_batches)))
Run and see output
train_x , test_x , train_y , test_y = load_mnist()
layers_dims = [10, 10]
ann = ANN(layers_dims,train_x.shape[1])
ann.fit(train_x, train_y, learning_rate=.1, n_iterations=100,batch=64)
I am showing you only last few steps of training model output as it has a large number of iterations
After completing the traing lets plot loss to see how it changes over each epoch
Sanity check with a single image prediction
Our model performs pretty well as you can see , it has 92.3% test accuracy with a train accuracy with 95.45%.There are lot of scope to update this model and get higher accuracy.Let me know your approach to get better performance.
Thank you, for reading.Here is the full code
References
MIT Deep Learning 6.S191MIT Deep Learning 6.S191http://introtodeeplearning.com