Adversarial Robustness of Optical Character Recognition (OCR)

4 min readSep 19, 2019

Building a simple black-box attack with Adversarial Robustness 360 Toolbox (ART)

Posted by Sharon Qian (Harvard) and Beat Buesser (IBM)

Background

Over the past few years, there has been a wealth of research on adversarial attacks on neural network models trained for image classification, showing that barely imperceptible altercations to images can easily fool a trained deep neural network. These attacks can be categorized into white-box attacks, where the attacker has complete knowledge of the attacked classifier model, and black-box attacks where the attacker only has access to classification output by the classifier model for a provided input, although many grey-box scenarios exist in-between.

This tutorial shows how to use the Adversarial Robustness 360 Toolbox (ART) [1] to create a black-box attack against a trained classifier. Specifically, we show how to use ART’s implementation of the HopSkipJump attack [2] to generate adversarial examples that fool the popular Tesseract Optical Character Recognition engine [3]. The full example code of this tutorial can be found in ART’s Github repository [4].

Create a Classifier

We start by wrapping Tesseract into ART’s BlackBoxClassifier, which allows us to connect anything that can be run with a Python function to the tools in ART. Here we will use it to run text recognition on images of the words “dissent” and “assent”. The BlackBoxClassifier only requires the definition of the predict function, setting the input image shape and the number of possible classes.

To demonstrate the versatility of BlackBoxClassifier we will query Tesseract via the command line called by Python. The prediction function below wraps this command line call and maps the Tesseract output to 3 possible classes: “dissent”, “assent” or “other”. This significantly reduces the complexity of the model to a 3-class problem.

# predict function to call tesseract from the command line and convert
# its output to a one-hot encodingdef predict(x):
   out_label = []
   for x_i in x:   # save image as intermediate png
   imageio.imsave(‘tmp.png’, x_i)
 
   # run tesseract
   os.system(“tesseract tmp.png out”)
   
   # read text
   file = open(“out.txt”,”r+”)
   test = file.read()
   out_string = test.strip()   # convert to categorical
   if out_string == ‘dissent’:
       out_label.append(0)
   elif out_string == ‘assent’:
      out_label.append(1)
   else:
      out_label.append(2)
   
   return to_categorical(out_label, 3)

Next, we will read the initial image image_init, the word we would like to see predicted by Tesseract, and the target image image_target, the word we would like to see and read on the adversarial image.

image_init = imageio.imread(‘assent.png’)
image_target = imageio.imread(‘dissent.png’)

Now, we are ready to create the ART BlackBoxClassifier.

classifier = BlackBoxClassifier(predict, image_target.shape, 3, clip_values = (0, 255))

Test the Classifier

We can use the BlackBoxClassifier to make predictions and inspect if the wrapped Tesseract makes correct predictions on our two benign images.

# this is the image we want to target
plt.imshow(image_target)
plt.show()
print(‘Tesseract output is: ‘ + label_dict[np.argmax(classifier.predict(np.array([image_target])))])

# this is the label we want to perturb to
plt.imshow(image_init)
plt.show()
print(‘Tesseract output is: ‘ + label_dict[np.argmax(classifier.predict(np.array([image_init])))])

Generate Adversarial Examples on Optical Character Recognition (OCR)

Now we create an adversarial example by querying Tesseract for predictions to recognize “dissent” is “assent” with ART’s implementation of the HopSkipJump Attack.

attack = HopSkipJump(classifier=classifier, targeted=True, norm=2, max_iter=0, max_eval=1000, init_eval=10)
iter_step = 10
x_adv = np.array([image_init])
for i in range(13):
   x_adv = attack.generate(x=np.array([image_target]),
           x_adv_init=x_adv, y=to_categorical([1], 3))   if i%3 == 0:
      print(“Adversarial image at step %d.” % (i * iter_step), “L2 error”, np.linalg.norm(np.reshape(x_adv[0] — image_target, [-1])), “and Tesseract output %s.” % label_dict[np.argmax(classifier.predict(x_adv)[0])])
      plt.imshow(x_adv[0])
      plt.show(block=False)   attack.max_iter = iter_step