Build Lenet from Scratch
Lenet is a classic example of convolutional neural network to successfully predict handwritten digits. In this example, I built the network from scratch only based on the python library “numpy”.
Convolutional Layers
“The Conv layer is the core building block of a Convolutional Network that does most of the computational heavy lifting.”
In this layer, several filters map to the different places of input data to amplify the features according to the content of the filters. For example, sharpen filters extract high frequencies components of the input image, allowing us to implement edge detection. In this project, I use the concept to get the hybrid images containing the high and low frequencies components of two different images. For more details, just see the project github page.
class Convolution2D: def forward(self, inputs):
# weight size: (F, C, K, K)
# bias size: (F)
# input size: (C, W, H)
# output size: (N, F ,WW, HH)
C = inputs.shape[0]
W = inputs.shape[1]+2*self.p
H = inputs.shape[2]+2*self.p # do zero padding if needed
self.inputs = np.zeros((C, W, H))
for c in range(inputs.shape[0]):
self.inputs[c,:,:] = self.zero_padding(inputs[c,:,:],
self.p) # compute output(feature map) size
WW = (W - self.K)/self.s + 1
HH = (H - self.K)/self.s + 1
feature_maps = np.zeros((self.F, WW, HH))
for f in range(self.F):
for w in range(WW):
for h in range(HH):
# weight and sum
feature_maps[f,w,h]=np.sum(
self.inputs[:,w:w+self.K,h:h+self.K]*
self.weights[f,:,:,:]
)+self.bias[f]
return feature_maps
See this blog for more details.
Max-Pooling Layers
It is common to see that a max-pooling layer is inserted between convolutional layer in order to reduce the amount of parameters and computation in the network, including some noise. Intuitively, we downsample the volume of the output of convolutional layers because we only want the most significant features in some certain position.
“Left: In this example, the input volume of size [224x224x64] is pooled with filter size 2, stride 2 into output volume of size [112x112x64]. Notice that the volume depth is preserved. Right: The most common downsampling operation is max, giving rise to max pooling, here shown with a stride of 2. That is, each max is taken over 4 numbers (little 2x2 square).”
class Maxpooling2D: def forward(self, inputs):
self.inputs = inputs
C, W, H = inputs.shape
new_width = (W - self.pool)/self.s + 1
new_height = (H - self.pool)/self.s + 1
out = np.zeros((C, new_width, new_height))
for c in range(C):
for w in range(W/self.s):
for h in range(H/self.s):
# retrieve max value according to the pooling
size
out[c, w, h] = np.max(
self.inputs[c,
w*self.s:w*self.s+self.pool,
h*self.s:h*self.s+self.pool])
return out
Fully-Connected Layers
Neurons have full connections to all activations(flatten th eoutputs) in the previous layer, as seen in regular neural networks. Fully-Connected Layers play classifiers role in the convolutional neural network.
class FullyConnected:
def forward(self, inputs):
self.inputs = inputs
return np.dot(self.inputs, self.weights) + self.bias.T
Architecture
This is the architecture in my code.
input: 1x28x28
conv1: f(6x5x5)@s1p2 -> 6x28x28
maxpool2: p(2x2) -> 6x14x14
conv3: f(16x5x5)@s1 -> 16x10x10
maxpool4: p(2x2) -> 16x5x5
conv5: f(120x5x5)@s1 -> 120x1x1
fc6: 120 -> 84
fc7: 84 -> 10
softmax: 10 -> 10
Result:
Complete code: https://github.com/gary30404/convolutional-neural-network-from-scratch-python
If you like this article and consider it useful for you, please support it with 👏.