# Character Recognition using TensorFlow

TensorFlow is an open source library for numerical computation using data flow graphs. Nodes in the graph represent mathematical operations while edges represent multidimensional arrays (called tensors) communicated between them. TensorFlow was originally developed by researchers working at google brain team within google’s machine intelligence research organisation. TensorFlow allows us to quickly to construct and deploy architecture to one or more GPUs or CPUs on a desktop, server or mobile device with single API. TensorFlow is mainly used for conducting deep neural network research but it is general enough to be used for many other purposes.

Our goal is to develop a neural network in tensorflow which can recognize devanagari characters from images. We have already seen how to develop neural network from scratch for poker hand prediction here.

Data consists of two parts, one for training and one for testing. We have images of size 320*320 each of which has a character in it. The output is an integer between 0 to 103(both inclusive) representing the character.

The general architecture is to have an input node for each pixel and have 104 output nodes. If ith node outputs 1 then the network has recognized the image as ith character. Now having good enough number of layers and enough neurons should give us good results as we have done in previous case.

As it turns out, this architecture is very bad for various reasons. Firstly having input of size 320*320 is not recommended as it makes the size of network very large which will enormous amount of time to train. Secondly having large number of layers again takes unreasonable time to train. Thirdly if look at the input, mostly the images are white with very thin black lines indicating character. This means most of inputs are 1 and extremely few are 0. This kind of input is not very good. We need good number of neurons representing the actual input data and not all 1.

We do the following transformation on the input. First we invert the images so that we can get 1 and 0 interchanged. Then we apply gaussian blur to increase thickness of character(making more input represent the character). Then we resize the image. This done to decrease the size of network. We finally have 80*80 size images.

Here is the python code for input transformation:

from skimage import io, filters

# import Image

from skimage import transform

import numpy

from scipy import misc

import timetrain_size = 17205

test_size = 1829#start_time = datetime.now()for i in range(train_size):

buf = './train/{}.png'.format(i)

buf1 = './train_small/{}.png'.format(i)

image = io.imread(buf)

image = numpy.invert(image)

image = filters.gaussian(image, 5)

image = transform.resize(image, (80, 80))

misc.imsave(buf1, image)

for i in range(test_size):

buf = './valid/{}.png'.format(i)

buf1 = './valid_small/{}.png'.format(i)

image = io.imread(buf)

image = numpy.invert(image)

image = filters.gaussian(image, 5)

image = transform.resize(image, (80, 80))

misc.imsave(buf1, image)#print("Time taken:", datetime.now() - start_time);

Next we start develop an architecture for neural network. As a general rule when we don’t have any specific architecture, we have number of features equals number of input, number of classes equals number of outputs and decreases neurons by factor of 2 or 4 in each new layer.

With some experiments we can find that network with 3 or more hiddenlayers takes long time to train on PC. So we stick with just 2 layer architecture. Next we vary other parameters and check for accuracy. Architecture with 1024 neurons in first layer and 256 neurons in second hidden layer gives good result. We stick with the well known sigmoid activation function.

We just need to have a good error function. Again as a general rule we take a function of error regularised with weights and biases. This configuration gives us about 70% accuracy.

The python code for the architecture is

from __future__ import print_functiondirec = "./train_small/"

valdirec = "./valid_small/"import tensorflow as tf

import numpy

from skimage import io# Parameters

learning_rate = 0.001

training_epochs = 500

batch_size = 100

display_step = 1# Network Parameters

n_hidden_1 = 1024 # 1st layer number of features

n_hidden_2 = 256 # 2nd layer number of features

n_input = 6400 #80x80

n_classes = 104 # total classes (0-103 digits)# tf Graph input

x = tf.placeholder("float", [None, n_input])

y = tf.placeholder("float", [None, n_classes])xhat = numpy.zeros([17205, n_input])

for i in range(17205):

image = io.imread(direc + str(i) + ".png")

image = numpy.reshape(image, n_input)

image = image.astype(float)

image=image/numpy.max(image)

xhat[i] = imagexvhat = numpy.zeros([1829, n_input])

for i in range(1829):

image = io.imread(valdirec + str(i) + ".png")

image = numpy.reshape(image, n_input)

image = image.astype(float)

image=image/numpy.max(image)

xvhat[i] = image#caching y values for training data

yhat = numpy.zeros([17205])

with open(direc + "labels.txt") as fp:

for k, line in enumerate(fp):

yhat[k] = int(line)#caching y values for validation data

yvhat = numpy.zeros([1829])

with open(valdirec + "labels.txt") as fp:

for k, line in enumerate(fp):

yvhat[k] = int(line)#input function taking x and y

def input_next(i, batch_size):

ret_x = numpy.zeros((batch_size, n_input))

ret_y = numpy.zeros((batch_size, n_classes))

for j in range(i*batch_size, (i+1)*batch_size):

ret_x[j - i*batch_size] = xhat[j]

ret_y[j - i*batch_size][int(yhat[j])] = 1 #y is cached in yhat

return ret_x, ret_y.astype(numpy.float)def input_valid(i, batch_size):

ret_x = numpy.zeros((batch_size, n_input))

ret_y = numpy.zeros((batch_size, n_classes))

for j in range(i*batch_size, (i+1)*batch_size):

ret_x[j - i*batch_size] = xvhat[j]

ret_y[j - i*batch_size][int(yvhat[j])] = 1 #yvalid is cached in yvhat

return ret_x, ret_y.astype(numpy.float)# Create model

def multilayer_perceptron(x, weights, biases):

# Hidden layer with relu6 activation

layer_1 = tf.add(tf.matmul(x, weights['h1']), biases['b1'])

layer_1 = tf.nn.sigmoid(layer_1)

# Hidden layer with relu6 activation

layer_2 = tf.add(tf.matmul(layer_1, weights['h2']), biases['b2'])

layer_2 = tf.nn.sigmoid(layer_2)

# Output layer with linear activation

out_layer = tf.nn.softmax(tf.matmul(layer_2, weights['out']) + biases['out'])

return out_layer# Store layers weight & bias

weights = {

'h1': tf.Variable(tf.random_normal([n_input, n_hidden_1], stddev=0.01)),

'h2': tf.Variable(tf.random_normal([n_hidden_1, n_hidden_2], stddev=0.01)),

'out': tf.Variable(tf.random_normal([n_hidden_2, n_classes], stddev=0.01))

}

biases = {

'b1': tf.Variable(tf.random_normal([n_hidden_1], stddev=0.01)),

'b2': tf.Variable(tf.random_normal([n_hidden_2], stddev=0.01)),

'out': tf.Variable(tf.random_normal([n_classes], stddev=0.01))

}# Construct model

pred = multilayer_perceptron(x, weights, biases)# Define loss and optimizer

#cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(pred + 1e-6, y))

cost = -tf.reduce_sum(pred*tf.log(y+1e-10)) + 0.001*( tf.nn.l2_loss(weights['h1']) + tf.nn.l2_loss(weights['h2']) + tf.nn.l2_loss(weights['out'])+ tf.nn.l2_loss(biases['b1']) + tf.nn.l2_loss(biases['b2']) + tf.nn.l2_loss(biases['out']))

optimizer = tf.train.AdamOptimizer(learning_rate = learning_rate).minimize(cost)saver = tf.train.Saver()

# Initializing the variables

init = tf.initialize_all_variables()# Launch the graph

with tf.Session() as sess:

sess.run(init)

# Training cycle

for epoch in range(training_epochs):

avg_cost = 0.

total_batch = int(17205/batch_size)

# Loop over all batches

for i in range(total_batch):

batch_x, batch_y = input_next(i, batch_size) # Reads the next batch_size input files

# Run optimization op (backprop) and cost op (to get loss value)

_, c = sess.run([optimizer, cost], feed_dict={x: batch_x, y: batch_y})

# Compute average loss

avg_cost += c / total_batch

# Display logs per epoch step

if epoch % display_step == 0:

print("Epoch:", '%04d' % (epoch+1), "cost=", \

"{:.9f}".format(avg_cost))

print("Optimization Finished!")

# Test model

correct_prediction = tf.equal(tf.argmax(pred, 1), tf.argmax(y, 1))

# Calculate accuracy

accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))

'''total_valid = int(1829/batch_size)

accu = 0.0;

for i in range(total_valid):

valid_x, valid_y = input_valid(i, batch_size)

accu += accuracy.eval({x: valid_x, y: valid_y})'''

valid_x, valid_y = input_valid(0,1829)

accu = accuracy.eval({x: valid_x, y: valid_y})

print("Accuracy:", accu)

saver.save(sess, "./model.ckpt")

The input and validation data can be obtained from here.

The time taken for writing and deploying this network was very small(except for debugging time). So tensorflow makes it easier to quickly test and deploy models. We finally managed to achieve 72% accuracy with small size network.

Happy Coding!