OCR: Part 2 — OCR using CNN

Vijendra Singh
2 min readJul 19, 2018

--

In the last part (part 1) of this series, we saw how to a generate a sample dataset for OCR using CNN. In this part, we will implement CNN for OCR. We will implement CNN using Tensorflow. Training and testing will be done using the data generated in the last part. We will not beat around the bush but directly jump to the key parts and then you could check the source code for details.

Input:

Data we generated in the last part had the following specification:

  1. Image size: 32, 256, 1
  2. Minimum number of characters in an image: 3
  3. Maximum number of characters in an image: 8
  4. length of the label (max_char): 16
  5. Total possible values of each character in the string could take (all_char): 63
  6. font = cv2.FONT_HERSHEY_SIMPLEX
  7. font size = 0.7 to 1
  8. font thickness = 1 to 3

CNN architecture:

After playing a bit with CNN architecture I finally found following architecture working best in this case:

CNN Architecture

CNN architecture consists of 4 convolutional layers. In above architecture, layers have been designed in such a way that final output will have the dimension of 16x63 and hence we could take each row (1X63) as probabilities of different possible characters for that particular position of output label.

Loss:

Multi-class loss is calculated by taking the average of softmax for each character in the output label. The code snippet is given below for better understanding:

def multi_loss(logits, labels, batch_size, max_char):
'''
cross entopy loss for multi class
'''
loss = 0
for i in range(max_char):
loss += tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits\
(logits=logits[:,i,:],labels=labels[:,:,i]), \
name='cross_entropy_loss_mean')
loss /= max_char
tf.add_to_collection('losses', loss)
total_loss=tf.add_n(tf.get_collection('losses'), name='total_loss')
tf.add_to_collection('losses', total_loss)
return total_loss

Result:

After training the CNN architecture we discussed above, we will get result in something similar to what is shown below:

Test data prediction

Now you could go ahead and try out implementing on your own (highly recommended) or you could directly check source code which is available HERE.

In the next part, we will try out something better. We will try to segment the characters from a given screenshot and then recognize each character to read complete paragraph. Give it a try if you are interested.

--

--