# Solving CAPTCHA with TensorFlow and Go

## A robot that proves it’s not a robot way better than I do

Jan 8, 2018 · 6 min read

Before the Blacklight event at DevFest Siberia, I was tasked with play-testing a specific challenge to prove that it’s possible to crack in a reasonable time. That challenge was one I face occasionally in my everyday life, not always with success: solving a bunch of captchas.

At DevFest, people had to break into a room without surveillance cameras capturing the break-in attempt. Disabling the camera required entering a four-digit security PIN into a CAPTCHA-protected form.

Provided were a TensorFlow SavedModel in the binary ProtoBuf format, trained to recognize that particular captcha (tensorflow-savedmodel-captcha.zip, 27.7 MB), and a link to the camera interface.

An input of a TensorFlow model requires doing some TensorFlow!

# A Few Words about TensorFlow

TensorFlow is an open-source software library for Machine Intelligence, used mainly for machine learning applications such as neural networks.

TensorFlow runs computations involving tensors. There are many sources to understand what a Tensor is, and this article is definitely not a sufficient one: it only has the bare minimum to make sense of what the code does. Tensors are awesome and complex mathematical objects, and I encourage you to take the time to learn more about them.

For our purposes, here is the explanation from the TensorFlow website:

A tensor is a generalization of vectors and matrices to potentially higher dimensions. Internally, TensorFlow represents tensors as n-dimensional arrays of base datatypes.

A tensor is defined by the data type of the values it holds and its shape, which is the number of dimensions and number of values per dimension.

The “Flow” part in TensorFlow comes to describe that essentially the graph (model) is a set of nodes (operations), and the data (tensors) “flows” through those nodes, undergoing mathematical manipulation. You can look at, and evaluate, any node of the graph.

# A Few Words about TensorFlow + Go

On the official TensorFlow website, you can find a page dedicated to Go describing the recommended use:

TensorFlow provides APIs for use in Go programs. These APIs are particularly well-suited to loading models created in Python and executing them within a Go application.

It also warns that the TensorFlow Go API is not covered by the TensorFlow API stability guarantees, although to the date of this post, everything is still working as expected.

When going to the package page, there are two more warnings:

1. The API defined in this package is not stable and can change without notice.
2. The package path is awkward and can change in the future as well: .

In theory, the Go APIs for TensorFlow are powerful enough to do anything you can do from the Python APIs, including training (Asim Shankar has a good example of training a model in Go using a graph written in Python). In practice, some of workflows, particularly those for model construction, are very low level and certainly not as convenient as in Python.

For now, it generally makes sense to define the model in TensorFlow for Python, export it, and then use the Go APIs for inference or training. So while Go might not be your first choice for working with TensorFlow, they do play nice together when using existing models.

Note: thanks to Asim Ahankar from the TensorFlow team for pointing out it is possible to train models with Go. We will collaborate further to make the documentation around this more accessible.

# Let’s Break In

The interface I was facing seemed pretty close to your regular captcha-protected form. My TO DOs were:

1. Inspect the model.
7. Iterate over all possible PIN codes.

## 1. Inspect

SavedModel is the universal serialization format for TensorFlow models.
TensorFlow documentation

We don’t know anything about the model, so our first step would be figuring out the input and output nodes for prediction. The SavedModel format allows storing this information, called a signature, as metadata; the tool for inspecting signatures is SavedModel CLI. Here’s the command and its output:

\$ saved_model_cli show --dir ./tensorflow_savedmodel_captcha --allMetaGraphDef with tag-set: 'serve' contains the following SignatureDefs:signature_def['serving_default']:The given SavedModel SignatureDef contains the following input(s):
inputs['input'] tensor_info:
dtype: DT_STRING
shape: unknown_rank
The given SavedModel SignatureDef contains the following output(s):
outputs['output'] tensor_info:
dtype: DT_STRING
shape: unknown_rank
Method name is: tensorflow/serving/predict

What we learn from this are the node names used with the tag :

• input: ,
• output: .

## 2. Start

Let’s define constants for the camera URL and for the error messages we already know:

We’ll add a log file support to our function to make sure everything is logged properly:

And implement a function for our login attempts. If the response is unknown, we’ll output it in full. Otherwise we’ll add a short entry to either the logfile or stdout.

Now let’s load the model with . The full signature is:

func LoadSavedModel(exportDir string, tags []string, options *SessionOptions) (*SavedModel, error)

The function takes 3 arguments: path, tags and session options. Explaining tags and options can easily take the entire post and will shift the focus, so for now you should know that is the tag used to serve TensorFlow models, and session options are not required in our case.

## 4. Fetch

Now let’s get the captcha image.

In order to keep the session with the generated captcha, we will first open a cookie jar. Even though it’s the first time I am writing about cookies publicly, I will spare cookie jokes as part of the Christmas spirit.

## 5. Predict

To run a prediction we need to supply inputs, called feeds (operations to feed our data to, mapped to tensors containing the data), and outputs, called fetches (operations to fetch the data from).

We only have one feed (input):

• the feed operation is ,
• the feed tensor is a string containing the CAPTCHA image as bytes.

Likewise, there is only one fetch (output): .

After we run the model with our feeds and fetches, we receive the output — the captcha prediction.

Once the values — the PIN code and the captcha prediction — are there, let’s POST the request to disable the camera.

If the captcha prediction failed, we retry with the same PIN code.

The function checks and reports whether the response is one of the known error messages, which I’ve found out manually by guessing PIN codes and making both correct and incorrect captcha predictions.

## 7. Iterate

The PIN code only has 4 digits, so we’ll just go over all the combinations. Additionally, in each iteration the loaded model is required for the prediction operation, and of course some logs.

# To Wrap This Up

The full code after everything is composed together is available on GitHub: Pisush/break-captcha-tensorflow.

TensorFlow has many models which can all be used with Go. There is a great list of those at tensorflow/models.

The pre-trained model provided for the challenge is based on emedvedev/attention-ocr.

Online challenges can be an awesome way to learn, whether it’s coding, security or sports. The combination of putting in practice your knowledge and having a mission creates a fun environment where you can work on improving your skills. Consider joining such a challenge as your new year’s resolution.

Thanks a lot to Ed for reviewing this PR and to Asim Shankar from the TensorFlow team for his input on the TensorFlow + Go part of this article.

Written by

## Blacklight

#### Stories behind Blacklight, an alternate reality game focused on programming and information security.

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just \$5/month. Upgrade