Deep Neural Network in (Nearly) Naked Python

Ben Grainger
Oct 28, 2020 · 5 min read

When I decided I wanted to understand neural networks (NN) I thought that each new NN was crafted from scratch with a unique structure. At the time I was disappointed to find out that NNs (outside of research looking for novel algorithms) are generally built using specialty libraries like Tensorflow and Pytorch, which allow complex neural networks to be set up in just a few lines of code. At the time it appeared to me that this took all the fun out of it.

Since then I have learned that it would be absurd to code each NN from scratch. Partly because it would be pointless to rewrite code that could be generalized for later use, and libraries like Tensorflow utilise CUDA, which allows the multithreaded GPU to perform efficient matrix computations.

Even so, I still wanted to code my own NN using (nearly) base python.

For this project I wanted the code to be as simple (so no complex class hierarchys!) as possible just to prove to myself how mechanical these algorithms were. The only exception was that I used Numpy so that the program didn’t run for longer than the universe has existed.

I went with the classic and simple dataset of handwritten digits as I’m trying to break and paradigms or predict the future. I just wanted my code to work as I intended. That means I had 400 parameters corresponding to the greyscale pixel value for every pixel in each image, of which there were 5000.

The only complexity I added was redundant hidden layers. For the task of identifying digits I could have got away with 1 hidden layer. But this is an experiment so why not have 4 layers with may more weights than I could possibly need. Also, strangely, it did not increase the complexity of the code at all.

The Code:

I started by defining the activation functions. I chose Relu for the hidden layers and sigmoid for the output layer with softmax:

Image for post
Image for post
Image for post
Image for post

Next I wrote a function to initialize random weights:

Image for post
Image for post

and then I initialized the weights and biases:

Image for post
Image for post

next the cost function:

Image for post
Image for post

forward propagation actually came down to a single line of code that needed to be repeated for every layer

Image for post
Image for post
Image for post
Image for post

The thing that made me nervous was backpropagation, I never feel confident that I have fully grasped the concept. However, coding it this way showed me that it is essentially the same 4 lines of code for every layer:

Image for post
Image for post
Image for post
Image for post

then use the derivatives I calculated in backward propagation:

Image for post
Image for post
Image for post
Image for post

And finally, put it all together in a surprising small number of lines:

Image for post
Image for post

And it worked!

Image for post
Image for post

Kind of…

Image for post
Image for post

Conclusion

So. The code works. Well, the cost is reducing… I could let this run for another 5000 iterations to make sure the optima has been reached. However, it took 5 minutes to learn this much. Realistically, this is not a good way to build a model. Bugs are no easy fix, optimizing the algorithm is way more complicated than it needs to be, and crucially every iteration takes an age. So in conclusion, I'm really really glad we have libraries like Tensor flow today. And I look forward to trying to master it in the future

Update: getting it to learn

I decided that I actually wanted this to work. So I updated a few features. After a few tried I realized that I was experiencing vanishing gradient. I therefore had to update my weight initialization function. Oops!

Image for post
Image for post

I also reduced the number of weights to a sensible number so that I each pass wasn't unnecessarily long

Image for post
Image for post

And removed the softmax function as it wasn't helping anything.

lastly I reduced the regularizing function.

This is a reminder that even if the implementation of neural nets is relatively straightforward, getting them to actually learn is a little more finicky and requires some tinkering.

Image for post
Image for post
Image for post
Image for post

The Startup

Medium's largest active publication, followed by +752K people. Follow to join our community.

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store