Convolution Neural Networks using Tensorflow

Ankit Mittal
pyankit
Published in
5 min readNov 27, 2017

I played around with CIFAR10 dataset and documenting here, going from 63% test accuracy to a little under 80%

Dataset — It consists of 60k images in 10 classes. The dataset is perfectly balanced with 6k images per class. 50k of these are in training dataset and 10k in test dataset. The goal of this exercise was to build a model, feed images to it to train and check how good the model is on test data.

These are the classes and some sample images.

Thinking about the problem, first step was to look for inspiration which I found in LeNet architecture applied on MNIST. That looked like a good starting point. Here is a refresher for a basic LeNet.

LeNet Architecture:

  1. Convolution layer, with reLU and 2x2 max pooling, with 32 filters
  2. Another convolution layer with reLU and 2x2 max pooling, with 64 filters.
  3. Fully Connected layer with 1025 neurons, dropout with 0.5 keep probability.
  4. Another FC layer with 10 outputs this time.

This straight out of the box CNN, gives 66% accuracy on test set. Not bad at all. Lets try to improve it.

# Helper functions, for instantiating layers. Will be used across all architecturesdef weight_variable(shape):
return tf.Variable(tf.truncated_normal(shape, stddev=0.1))

def bias_variable(shape):
return tf.Variable(tf.constant(0.1, shape=shape))

def conv2d(x,W):
return tf.nn.conv2d(x,W,strides=[1,1,1,1], padding='SAME')

def max_pool(x):
return tf.nn.max_pool(x, ksize=[1,2,2,1],
strides=[1,2,2,1], padding='SAME')

def conv_layer(input, shape):
W = weight_variable(shape)
b = bias_variable([shape[3]])
return tf.nn.relu(conv2d(input, W) + b)

def full_layer(input, size):
in_size = int(input.get_shape()[1])
W = weight_variable([in_size, size])
b = bias_variable([size])
return tf.add(tf.matmul(input, W), b)
x = tf.placeholder(tf.float32, [None, 32, 32, 3])
y = tf.placeholder(tf.float32, [None, 10])
keep_prob = tf.placeholder(tf.float32)

conv1 = max_pool(conv_layer(x, shape=[5,5,3,32]))
conv2 = max_pool(conv_layer(conv1, shape=[5,5,32,64]))
conv2_flat = tf.reshape(conv2, [-1, 8*8*64])

full_1 = tf.nn.relu(full_layer(conv2_flat, 1024))
full_1_dropout = tf.nn.dropout(full_1, keep_prob=keep_prob)

y_conv = full_layer(full_1, 10)

#training
cross_entropy = tf.losses.softmax_cross_entropy(y, y_conv)
optimizer = tf.train.AdamOptimizer(0.0001).minimize(cross_entropy)

accuracy = tf.reduce_mean(
tf.cast(
tf.equal(
tf.argmax(y, 1),
tf.argmax(y_conv, 1),
),
tf.float32,
),
)*100

LeNet Code

MINI_BATCH_SIZE=512
STEPS = 55000
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())

for step in range(STEPS):
batch = cifar.train.next_batch(MINI_BATCH_SIZE)
sess.run(optimizer, feed_dict={
x:batch[0], y:batch[1], keep_prob:0.5
})

if step%500 == 0:
acc = sess.run(accuracy, feed_dict={
x:batch[0], y:batch[1], keep_prob:1.0
})
print("Train loss accuracy for step {} is {}".format(step,acc))


test_x = cifar.test.images.reshape(10,1000,32,32,3)
test_y = cifar.test.labels.reshape(10,1000, 10)
accuracy = np.mean([sess.run(accuracy, feed_dict={
x:test_x[i], y:test_y[i], keep_prob:1.0
}) for i in range(10)])
print("Accuracy is {}".format(accuracy))

In the next step, I added one more layer of convolutions before fully connected with 128 filters.

This took the test accuracy straight to 73%. Not bad, lets continue.

Next, I tried to remove the max pool completely from all the convolutions layer. This did not help and reduced the accuracy back to 66%. Ofcourse, max pooling is there for a reason.

MINI_BATCH_SIZE=512
STEPS = 60000
cifar = CifarDataManager()

x = tf.placeholder(tf.float32, [None, 32, 32, 3])
y = tf.placeholder(tf.float32, [None, 10])
keep_prob = tf.placeholder(tf.float32)

conv1 = max_pool(conv_layer(x, shape=[5,5,3,32]))
conv2 = max_pool(conv_layer(conv1, shape=[5,5,32,64]))
conv3 = max_pool(conv_layer(conv2, shape=[5,5,64,128]))
conv3_flat = tf.reshape(conv3, [-1, 4*4*128])
conv3_drop = tf.nn.dropout(conv3_flat, keep_prob=keep_prob)

full_1 = tf.nn.relu(full_layer(conv3_drop, 1024))
full_1_dropout = tf.nn.dropout(full_1, keep_prob=keep_prob)

y_conv = full_layer(full_1, 10)

#training
cross_entropy = tf.losses.softmax_cross_entropy(y, y_conv)
optimizer = tf.train.AdamOptimizer(0.0001).minimize(cross_entropy)

accuracy = tf.reduce_mean(
tf.cast(
tf.equal(
tf.argmax(y, 1),
tf.argmax(y_conv, 1),
),
tf.float32,
),
)*100

with tf.Session() as sess:
sess.run(tf.global_variables_initializer())

for step in range(STEPS):
batch = cifar.train.next_batch(MINI_BATCH_SIZE)
sess.run(optimizer, feed_dict={
x:batch[0], y:batch[1], keep_prob:0.5
})

if step%1000 == 0:
acc = sess.run(accuracy, feed_dict={
x:batch[0], y:batch[1], keep_prob:1.0
})
print("Train loss accuracy for step {} is {}".format(step,acc))


test_x = cifar.test.images.reshape(10,1000,32,32,3)
test_y = cifar.test.labels.reshape(10,1000, 10)
accuracy = np.mean([sess.run(accuracy, feed_dict={
x:test_x[i], y:test_y[i], keep_prob:1.0
}) for i in range(10)])
print("Accuracy is {}".format(accuracy))

Next, I checked out a completely different CNN topology which came after LeNet. This is called AlexNet and came from G. Hinton’s lab in Toronto. This was a groundbreaking paper in 2012 when it shattered the Imagenet records from 36% error rate to 16%. The difference from leNet was instead of using single convolution layer and then immediately use a max_pooling layer, use stacks of convolution layers as a block and then max pool.

So changing the three Convolution layers in the previous architecture, to blocks of convolutions with each block containing 3 convolution layers, dropout and maxpooling. This takes a bit longer to run than previous topologies. (2 hours on azure NC6 instance, with K80 GPU). I used, Deep Learning Virtual Machine on Azure for training this. This jumped up the test accuracy from 73% previously to 79.8%, a good steep jump.

MINI_BATCH_SIZE=512
STEPS = 50000
from random import randint

def conv_layer(input, shape):
W = weight_variable(shape)
b = bias_variable([shape[3]])
return tf.nn.relu(conv2d(input, W) + b)

x = tf.placeholder(tf.float32, [None, 32, 32, 3])
y = tf.placeholder(tf.float32, [None, 10])
keep_prob = tf.placeholder(tf.float32)

conv1_1 = conv_layer(x, shape=[3, 3, 3, 30])
conv1_2 = conv_layer(conv1_1, shape=[3, 3, 30, 30])
conv1_pool = max_pool(conv_layer(conv1_2, shape=[3, 3, 30, 30]))
conv1_drop = tf.nn.dropout(conv1_pool, keep_prob=keep_prob)

conv2_1 = conv_layer(conv1_drop, shape=[3, 3, 30, 50])
conv2_2 = conv_layer(conv2_1, shape=[3, 3, 50, 50])
conv2_pool = max_pool(conv_layer(conv2_2, shape=[3, 3, 50, 50]))
conv2_drop = tf.nn.dropout(conv2_pool, keep_prob=keep_prob)

conv3_1 = conv_layer(conv2_drop, shape=[3, 3, 50, 80])
conv3_2 = conv_layer(conv3_1, shape=[3, 3, 80, 80])
conv3_pool = tf.nn.max_pool(
conv_layer(conv3_2, shape=[3, 3, 80, 80]),
ksize=[1, 8, 8, 1],
strides=[1, 8, 8, 1],
padding='SAME',
)
conv3_flat = tf.reshape(conv3_pool, [-1, 80])
conv3_drop = tf.nn.dropout(conv3_flat, keep_prob=keep_prob)

full1 = tf.nn.relu(full_layer(conv3_flat, 500))
full1_drop = tf.nn.dropout(full1, keep_prob=keep_prob)
y_conv = full_layer(full1_drop, 10)

optimizer = tf.train.AdamOptimizer(0.0001).minimize(
tf.losses.softmax_cross_entropy(y, y_conv)
)

accuracy = tf.reduce_mean(
tf.cast(
tf.equal(
tf.argmax(y, 1),
tf.argmax(y_conv, 1),
),
tf.float32,
),
)*100

test_x = cifar.test.images.reshape(10,1000,32,32,3)
test_y = cifar.test.labels.reshape(10,1000, 10)

with tf.Session() as sess:
sess.run(tf.global_variables_initializer())

for step in range(STEPS):
batch = cifar.train.next_batch(MINI_BATCH_SIZE)
sess.run(optimizer, feed_dict={
x:batch[0], y:batch[1], keep_prob:0.75
})

if step%1000 == 0:
train_accuracy = round(sess.run(accuracy, feed_dict={
x:batch[0], y:batch[1], keep_prob:1.0
}), 2)

i = randint(0,9)
test_accuracy = round(sess.run(accuracy, feed_dict={
x:test_x[i], y:test_y[i], keep_prob:1.0
}), 2)

print("Step {}: Train accuracy is {:.2f}, test accuracy is {:.2f}".format(
step,
train_accuracy,
test_accuracy,
))


print("--------------------------------------")
print("---------------------------------------")
accuracy = np.mean([sess.run(accuracy, feed_dict={
x:test_x[i], y:test_y[i], keep_prob:1.0
}) for i in range(10)])
print("Final Accuracy is {}".format(accuracy))

Here is the link to all the code — https://github.com/ankitml/tensorflow-exercises/blob/master/CIFARCNN.ipynb

PS: State of the art for CIFAR10 is 96.5%, published in 2015. Here is the leaderboard for CIFAR10.

Top published results for CIFAR 10. Found here.

--

--