The Doodling Workshop #2

Recognise your handwriting

manu rink
6 min readAug 3, 2017

A comment of interest:

If (you just want to know how handwritten recognition works) {
//go on reading
break
} else {
//have a look at the first part of the series first
https://medium.com/@codeprincess/the-doodling-workshop-1-ae955e351f7b
}

When I got my first document scanner way over 20 years ago it came with a bunch of different applications. One of it would “read” the scanned document and extract the text into a txt.file. This worked… yeah let’s say some text was recognised.

By now printed text recognition can be considered as a solved problem. Most common fonts can be easily recognised and therefore text extraction of images or documents in images works great by now.

What is not even nearly solved is the challenge with written text. There are such a lot of different handwriting styles and flavours. So many that actually people out there claiming they can read someone’s character from his or her handwriting.

A couple of OS have their own rudimentary handwriting recognition built-in by now. iOS 11 has presented great progress e.g. in the Notes app where your handwriting is immediately “translated” into text. And Windows 10 is able to do handwritten text recognition when you are in tablet mode with no keyboard attached. So there is a lot going on in the field of “More personal computing” to make input for humans into machines easier and more natural. And besides speech handwriting is one of them.

Microsoft Cognitive Services — Handwriting Recognition

I’m working over a year now with different areas of the Cognitive Services and recently stumbled across a rather new and unknown API within the Computer Vision branch: The Handwriting Recognition.

After I got my first own doodling app with the Apple Pencil running, I wanted to see if it is possible to code an “immediate handwriting recognition” in my own app — like Apple presented it for the Notes app.

First I had to get an understanding of how the APIs need to be used because there is an OCR call, a handwriting call and another one for results.

The best way is to use the handwriting detection via the recognizeText API.

https://[location].api.cognitive.microsoft.com/vision/v1.0/recognizeText[?handwriting]

It’s a POST request and just needs two pieces of information:

  • an image as an octet stream OR a web-link attached to the body
  • an optional URL parameter “handwriting=true/false”. It’s set to true by default but if you set it to false it will call the standard OCR API under the hood instead. This is a great switch if you are using the API for both handwriting and printed text in images.

As the text recognition can take a while, the API returns a Operation-Location field with an attached URL. With this URL we call another API called textOperations and ask for the results. The URL is already filled with the necessary info for location and the operationID so we can directly use it as a GET request for fetching our results.

https://[location].api.cognitive.microsoft.com/vision/v1.0/textOperations/{operationId}

If the recognition is already finished processing, the call will return a status succeeded (other status is not started, running, failed) and the recognised text of our image (or an empty array if nothing was detected).

And that’s basically all. After retrieving the result we display it to the user.

Get into the coding

The workflow I implemented is the following:

  • Start a timer and track if the user is drawing with the pencil
  • Track the minimum and maximum x- and y-coordinates of the current writing
  • If the user pauses for over 1 second, we consider the writing as done
  • Then we draw a rectangle around the current drawing and paste this very area into a new image
  • This new image is sent to the Cognitive Services Handwriting Recognition call
  • After getting the operation-location from the call’s response we wait for two seconds
  • And then we do the results call to fetch the results for our operation
  • If the recognised text is not empty we display it right under the handwritten rectangle. Otherwise we just use the content “Text not recognised”

Sounds like a lot, but it isn’t. Let’s start with the…

Handwriting timer & min/max coordinate tracking

First we start a timer for being able to track where the user is currently drawing. For each generated touch we calculate the current min and max values of the coordinates to be able to draw a rectangle around the area later on.

As soon as the user stops drawing for over a second the drawDoodlingRect method is called and the earlier on calculated min/max values are used for it.

Note that a new context is created every time a cycle is finished. So as soon as the frame rectangle is drawn around the doodling, the APIs are called for the recognition. And we already prepare our canvas for the next drawing.

The drawDoodleRect(context) is now creating the rectangle around the currently drawn area and just paints it on the current image context. This is the still the same context on which the whole drawing was done.

So the resetDoodleRect() function is starting a new context and the drawDoodlingRect(context) is finishing it. This cycle gets repeated over and over again.

The drawTextRect(context, rect) function just draws a rectangle right under the doodling rectangle to display the recognised text later — if any is returned by the API call :)

So this is everything on the UI side. No let’s have a look at the last missing piece, the …

Data and REST API call handling

First we call the Handwriting Recognition API and attach our image as an octet stream to the body. We then take the operation-location from the response and use this as the URL for our next request, which fetches the recognised text for our image.

As already mentioned above the two calls are necessary because the recognition may take a bit longer. By then the calling side of the request would probably already run into timeouts.

So how do the calls itself look like? There is no magic here, just plain REST API calls. And because the retrieveResultsForOcrOperation is super simple I just show you the bit more complex one for getting your image recognized.

If this is to less insight for you, just have a look into the full blown implementation which can be found on GitHub in the playgrounds repo. All related code to run this feature out of the box is located there.

If you prefer the ready to install iOS app instead of an playground you can get the source code right here.

… but the key!

After trying to run the code you might have noticed, that the API call doesn’t get called properly. The reason is: you need a key for being able to use the Cognitive Services Computer Vision APIs. But don’t worry. You’ll have one generated in no time and moreover get a free one for trial.

Just visit the Getting Started page here and right after having your API key generated, add it at line 69 in the Cognitive Services.swift file or just add it plain in your own implementation when adding the Ocp-Apim-Subscription-Key field to your request header .

Now you should be good to go and have fun with handwriting!
https://medium.com/@codeprincess/the-doodling-workshop-3-70d8e360956a

--

--

manu rink

useful thingies and stories about the funs and pains being a iOS coder and cloud fan. and a girl. and a nerd. and a technical evangelist @Microsoft.