Deep Neural Networks. Practice. Part 1.

Machine Learning & Data Science A-Z Guide.

Published in

Pharos Production

5 min readJun 16, 2017

Give us a message if you’re interested in Blockchain and FinTech software development or just say Hi at Pharos Production Inc.

In the previous two articles, we have looked through basic DNN theory — gradient descent, forward and backpropagations and a couple of more. This time we will implement all this theory with Python.

This material is based on Udacity Self-Driving Car Engineer Nanodegree — absolutely amazing learning stuff. I highly recommend enrolling to their courses right now. PS: No, they didn’t pay me :)

Prepare

We start with imported libraries and data preparations. We will use numpy, Boston House-price Dataset (regression) from SciKitLearn and a couple for utils to rearrange the data — shuffle and resample.

Required libraries

SKLearn has already prepared two sets for us:

‘data’ — the data to learn
‘target’ — the regression targets

Then we normalize the data — adjust values to a notionally common scale. We use standard score here because the population is known, it works well for normally distributed populations.

Next, we take features from an array. X_.shape shows (506, 13). So we have 13 features and 506 training samples. Also, we define 10 perceptrons in a hidden layer. Then we should initialize weights with a random uniform distributed values and biases with zeros. W and b with index 1 are weights and biases between input and hidden layer, with index 2 — between hidden layer and output.

Next, we define inputs and combine them into a dictionary. Also, we should define the hyperparameters of our network, where m is a number of samples and steps_per_epoch is a number of times we batch sampling input data in each epoch calculation.

Let’s define basic operations. We have covered all of them in previous articles. So they are all known except topological_sort. Topological sort, well, presuming all operations is as a Directed Acyclic Graph and sorting operations in order according to Kahn’s Algorithm.

The last part is a calculation by itself. We run N epochs each time calculation loss and new weights and biases. We split samples into batches randomly. Then run forward and backward propagations, calculate stochastic gradient descent and then a loss. Each time loss becomes smaller and smaller.

Stochastic Gradient Descent

Its implementation is quite straightforward

Forward and Backward propagations

We define forward and backward props in each node so they are different in every operation. For example, forward prop isn’t implemented in an input node.

Topological Sort

Kahn Algorithm:

Initialize sorted list to be empty, and a counter to 0
Compute the indegrees of all nodes
Store all nodes with indegree 0 in a queue
While the queue is not empty
Get a node U and put it in the sorted list. Increment the counter.
For all edges (U, V) decrement the indegree of V, and put V in the queue if the updated indegree is 0.
If the counter is not equal to the number of nodes, there is a cycle.

Mean Square Error

Sigmoid (Logistic Function)

Sigmoid

Linear Transformation

Input Node

Take a look. Input node doesn’t have a forward method because there is nothing to forward but backward method contains gradients.

Node

And the last one is a parent class of all nodes. It contains the interface for all children classes. We initialize every class with a list of inputs, with an empty value and with empty output nodes. When we link a node to inputs we loop over them adding current node to their outputs.