How to make a $1 Unistroke Gesture Recognizer in Swift

Published in

breakfastcode

6 min readOct 15, 2015

Although many modern smartphone and tablet display offers opportunities for using pen and finger in their user interfaces, implementing gesture recognition has been largely a privilege of AI/pattern matching experts, not general software programmers neither human-computer interaction designers (HCI).

There are different algorithms developed over the time which are able to recognise and translate gestures or symbols in something we can use to provide a new kind of interaction to our end users. Some of these are complex to write, some others may require a capable hardware (although modern devices maybe better than any low-end consumer PC); today I would to discuss about $1, an easy, cheap and usable almost anywhere algorithm which is able to provide from 97% to 99% of accuracy comparing to other noble ones like DTW or Rubine (depending by the number of templates provided for each symbol).
Moreover with medium-speed gesture, in which users balanced speed and accuracy of the drawn, gestures were better recognized than slow or fast gestures for all three recognizers.

Easy and cheap it provide from 97% to 99% of accuracy in less than 100 lines of code.

$1 Recognizer was developed by Jacob O. Wobbrok, Andrew D. Wilson and Yang Li and it is discussed in this paper from University of Washington (obviously, this post is heavily based on it). In this post I’ll talk about the main principles of the algorithm and I’ll propose a pure Swift implementation (future posts will include some variants which are able to support multistroke gestures).

In this article I’ll explain how to make an unistroke gesture recognizer with a sample iOS application; that’s the result:

$1 Recognizer is very simple; in fact is involving only a basic geometry and trigonometry knowledge and the length of the code is less than 100 lines. It also support configurable rotation, scale and position invariance while it does not require any training session to become usable.

There are several variants of this algorithm; soon as I can I’ll discuss another variant, called $P recognizer: $P can recognize gestures made with any number of strokes (while $1 is single stroke) in any order or direction with low storage cost (whereas $N, another variant, had high storage costs since it internally represented all the same variations that $P’s matching algorithm can handle seamlessly).

The algorithm in detail

What’s a gesture? Basically it’s a set of points (we call it candidate points or C); our scope is to find which set of previously recorded template points (we call it Ti) it’s close to our input.
Both candidate and template points must be recorded and sampled at a specified rate (in order to limit the number of points and so the data to manage) by some kind of sensing hardware and software (in our case we obviously use an iOS device).
This fact coupled with human errors mean that points in similar C and Ti will rarely “line up” so as to be easily comparable.

The example below describe two pairs of gestures, a “pigtail” and an “x” symbol which are composed by different points and with a different size; moreover the pigtail may be rotated by 90° CW degree and it will be similar to the x symbol.

*(From the paper)* Two pairs of fast (~600 ms) gestures made by a subject with a stylus. The number of points in corresponding sections are labeled. Clearly, a 1:1 comparison of points is insufficient.

With these constraints in mind, clearly any 1:1 comparison of the points is insufficient.
So $1 algorithm must be also resistant to variations of sampling operation and, moreover, should support any rotation/scale and position invariance.

Sampling data points

As we said above movement speed have a clear effect on the number of input points in a gesture; this is not only an issue related to our choosed sample rate, but it is also dependent from the input hardware: with a fast gesture we’ll have a minor number of sampled points than we may have with a slow gesture (see the picture below).

In this example you can see the different number of points sampled by the hardware with a fast and a slow gesture. In order to make them comparable we need to resample our input points array.

So, in order to make two gesture comparable we first resample our gestures points array; for an original M points array we get a N equidistantly spaced points array; a balanced value of N is 64; it guarantees a gret result without a big loss of precisions or an huge amount of data to manage.

To resample we first calculate the total length of the M point path; dividing this length by (N-1) gives the length of each increment INC, between N points.
Then the path is stepped through such that when the distance covered exceeds I, a new point is added through linear interpolation.
At the end of this step we’ll have a N point path and we can measure the distance from our input array and any other templates array.

That’s the swift code (StrokePoint data type is similar to CGPoint but uses double in order to get max precision).

This is the result of a sample:

Get the best rotation angle to align with templates

At this time we have two paths of ordered points (our input array and, for example the n-th template array); there is not any closed form solution to determine the angle to which one set points should be rotated to best align with the other one.
$1 recognizer than iterate over the space of possible angles to search the best alignment between two point-paths.
While this kind of operation maybe prohibitive for other algorithms, even naïvely rotating the gesture by 1° for 360° is fast enough even with 30 gesture templates; however the algorithm itself try to avoid this brute force approach and do some “rotation tricks” that makes the search for optional angle a faster process.

*Rotating a triangle so that its “indicative angle” is at 0° (straight right). This approximates finding the best angular match.*

The first step is to search gesture’s indicative angle, which it’s defined as the angle formed between the centroid of the gesture and the gesture’s first point.

Then we rotate the gesture so this angle is at 0°.

Scale & Translate

Our next step is to scale (non-uniformly) the gesture to a reference square: this allow the algorithm to rotate the candidate about its centroid and safely assume that changes in pairwise point-distances between the candidate and the template are due only to rotation, not to aspect ratio.
At the end of the scale the gesture itself is translated to put the centroid at the origin point (0,0).

That’s the code portion used:

Find the optimal angle and get the best score

The final step of this algorithm compare each stored template with our input path-points.
This comparison is made using a path-distance function which return the distance between our two path-points; the result of this function is then converted to a score point by another function.
Input gesture may be rotated in order to find the best angular alignment.

Complete project

$1 Project in Swift is available via github at https://github.com/malcommac/SwiftUnistroke; code is compatible with Swift2 and includes a simple iOS application which is able to recognize a set of specified template points. You are free to use it in your projects (both open source and commercial ones), make your pull requests or issues.