Build Lenet from Scratch

Gary(Chang, Chih-Chun)
Deep Learning#g
Published in
3 min readMay 18, 2018

--

Lenet is a classic example of convolutional neural network to successfully predict handwritten digits. In this example, I built the network from scratch only based on the python library “numpy”.

Convolutional Layers

“The Conv layer is the core building block of a Convolutional Network that does most of the computational heavy lifting.”

In this layer, several filters map to the different places of input data to amplify the features according to the content of the filters. For example, sharpen filters extract high frequencies components of the input image, allowing us to implement edge detection. In this project, I use the concept to get the hybrid images containing the high and low frequencies components of two different images. For more details, just see the project github page.

This is the demonstration of what convolutional layers do from Stanford cs231n.
class Convolution2D:    def forward(self, inputs):
# weight size: (F, C, K, K)
# bias size: (F)
# input size: (C, W, H)
# output size: (N, F ,WW, HH)
C = inputs.shape[0]
W = inputs.shape[1]+2*self.p
H = inputs.shape[2]+2*self.p
# do zero padding if needed
self.inputs = np.zeros((C, W, H))
for c in range(inputs.shape[0]):
self.inputs[c,:,:] = self.zero_padding(inputs[c,:,:],
self.p)
# compute output(feature map) size
WW = (W - self.K)/self.s + 1
HH = (H - self.K)/self.s + 1
feature_maps = np.zeros((self.F, WW, HH))
for f in range(self.F):
for w in range(WW):
for h in range(HH):
# weight and sum
feature_maps[f,w,h]=np.sum(
self.inputs[:,w:w+self.K,h:h+self.K]*
self.weights[f,:,:,:]
)+self.bias[f]

return feature_maps

See this blog for more details.

Max-Pooling Layers

It is common to see that a max-pooling layer is inserted between convolutional layer in order to reduce the amount of parameters and computation in the network, including some noise. Intuitively, we downsample the volume of the output of convolutional layers because we only want the most significant features in some certain position.

This is the demonstration of max-pooling layers from Stanford cs231n.

“Left: In this example, the input volume of size [224x224x64] is pooled with filter size 2, stride 2 into output volume of size [112x112x64]. Notice that the volume depth is preserved. Right: The most common downsampling operation is max, giving rise to max pooling, here shown with a stride of 2. That is, each max is taken over 4 numbers (little 2x2 square).”

class Maxpooling2D:    def forward(self, inputs):
self.inputs = inputs
C, W, H = inputs.shape
new_width = (W - self.pool)/self.s + 1
new_height = (H - self.pool)/self.s + 1
out = np.zeros((C, new_width, new_height))

for c in range(C):
for w in range(W/self.s):
for h in range(H/self.s):
# retrieve max value according to the pooling
size
out[c, w, h] = np.max(
self.inputs[c,
w*self.s:w*self.s+self.pool,
h*self.s:h*self.s+self.pool])
return out

Fully-Connected Layers

Neurons have full connections to all activations(flatten th eoutputs) in the previous layer, as seen in regular neural networks. Fully-Connected Layers play classifiers role in the convolutional neural network.

class FullyConnected:
def forward(self, inputs):
self.inputs = inputs
return np.dot(self.inputs, self.weights) + self.bias.T

Architecture

This is the architecture in my code.

input: 1x28x28
conv1: f(6x5x5)@s1p2 -> 6x28x28
maxpool2: p(2x2) -> 6x14x14
conv3: f(16x5x5)@s1 -> 16x10x10
maxpool4: p(2x2) -> 16x5x5
conv5: f(120x5x5)@s1 -> 120x1x1
fc6: 120 -> 84
fc7: 84 -> 10
softmax: 10 -> 10

Result:

Complete code: https://github.com/gary30404/convolutional-neural-network-from-scratch-python

If you like this article and consider it useful for you, please support it with 👏.

--

--