Only Numpy: Why we need Activation Function (Non-Linearity), in Deep Neural Network — With Interactive Code

Jae Duk Seo
The Startup
Published in
3 min readJan 16, 2018

So for today, I don’t want to do something to over complicated, rather a simpl proof with code. Lets get right into it, so why do we need activation function?

Non-Linearity

That’s it we need activation function because of the above reason. If you want to see in detail why read the blog post.

Network Architecture + Forward Feed Process

So there are two things to note here,
1. We are using IDEN activation function, and as seen above. It just returns whatever the input was. And the derivation is just 1. Below is a python implementation.

def identity_act(x):
return x

def d_identity_act(x):
return 1

2. As seen in the right, it is standard Neural Network, nothing fancy.

As seen above, we are using L2 cost function.

Also, as seen above, the dimension for 3 weights we have are (3*4), (4*10), and (10*1). And our input matrix is shown below.

So if we do the math.
Layer_1 = x.dot(w1) → (4*3)(3*4) →(4*4)
Layer_1_act = IDEN(Layer_1) →(4*4)
Layer_2 = Layer_1_act.dot(w2) → (4*4)(4*10) →(4*10)
Layer_2_act = IDEN(Layer_2) → (4*10)
Layer_3 = Layer_2_act.dot(w3) →(4*10)(10*1) →(4*1)
Layer_3_act = IDEN(Layer_3) → (4*1)
Cost = np.square(Layer_3_act-Y).sum() * 0.5

Back Propagation

Standard back propagation with Vectorization, nothing special. However please do note where I wrote in Red →Those are the places where we take the derivative of our Activation Function IDEN and it gives 1.

Forward Feed Version 2

As seen above, since our activation function is linear. We can use a neat trick to make the WHOLE Network into one simple line of math!

Interactive Code

Here is the link to the code.

Now lets take a look at each part, one by one.

Above is standard forward feed and back propagation, nothing special and below are results.

As seen 100% accuracy, (when rounded), now lets into the fun stuff. Lets calculate the K value.

As seen above, we can calculate the K value by simple dot product, and just perform x.dot(k) and we get the same results!

Final Words

If any errors are found, please email me at jae.duk.seo@gmail.com.

Meanwhile follow me on my twitter here, and visit my website, or my Youtube channel for more content. I also did deriving back propagation on simple RNN here if you are interested.

This story is published in The Startup, Medium’s largest entrepreneurship publication followed by 293,189+ people.

Subscribe to receive our top stories here.

--

--

Jae Duk Seo
The Startup

Exploring the intersection of AI, deep learning, and art. Passionate about pushing the boundaries of multi-media production and beyond. #AIArt