Machine Learning in iOS— for the noob
After working on a couple of occasions with handwritten recognition I’m in total awe of this piece of technology. Just by sending an image to a REST endpoint, waiting for the magic to happen and then having a bunch of JSON data with your recognised text in to as the response — just wow.
But I already “awed” a lot about this in a blog post a couple of months ago. For everybody who is interested in it and doesn’t know how to get started — look right here.
So where is now the handwritten recognition link to machine learning — for the noobs. Easy! All the recognition stuff just works online. So if you are using this feature with a bad network connection in place you will experience at least long waiting times and in the worst case just timeouts on your requests. Moreover the possibility of having no network connection at all is also likely in certain scenarios. What do you do then? Have all of the recognition offline in place — Bingo!
Before we start a quick side note to the impatient reader.
The whole project can be found on GitHub, including all the machine learning and model building stuff. So if you are more into looking into code instead of reading explanations, visit this repo, and see you soon :)
To be able to offer offline handwritten text recognition you might have already assumed that there needs to be a bunch of logic in place — packed into your application. And this is exaclty what this article is dealing with.
We will build our own handwritten text recognition model, convert it to a CoreML model and use it in a native iOS app to detect handwritten numbers — OFFLINE!
For two reasons we will just focus on handwritten number recognition:
- Because the current Computer Vision OCR API can’t do it properly. For proper recognition there must always be a context in place (like grammar, a certain format, …)
- To make things easier for this proof of concept — keep things small and easy (and believe me, it will already be complicated enough) :D
Great objective I would say, so let’s start!
CREATE AND TRAIN A MODEL
Machine Learning is first of all about data. If you have valid data which can be categorised nicely you are already on the winning path. If you haven’t them — you are doomed. In this case you have to take a couple of steps back and get a good data set to start with.
In our case we have a nice set of handwritten numbers from the MNIST database (http://yann.lecun.com/exdb/mnist/) which contains incredible 60.000 images of numbers with an original size of 20x20 pixels. So with this set we will work then and train our machine learning model. But we don’t have any model yet?!
The model shall not be our problem. There are such a lot of different algorithms out there which can help us building a valid model out of our data.
We’ll pick the …. SVM! A Support Vector Machine! Why this**? Because a colleague told me to. And because the example we’ll be using from the SciKit-Learn library told us to. The what? So let’s stop shortly to put things into the correct order.
Let’s visit http://scikit-learn.org/. This page offers a lot of Python based examples on how to solve problems with machine learning. Which is great!
So I searched for “hand-written digits” and found an already finished Python script on how to use an SVM (remember, the Support Vector Machine) on the number problem. With a machine like this, or an algorithm like this, we can classify things. In detail the SVM is a binary classifier, which can detect if a certain thing is part of group A or is something else. It doesn’t just necessarily say “This is A and not B but maybe C”, it tells you “This is A - and not B, C and D”.
As we have a look at the sample Python code, and depending how awesome you are with it (I wasn’t and aren’t — remember — noob!), the whole process is not that complicated though. And don’t be afraid if you don’t understand the very last detail of it. We will just run it and see it the result fits our need. In case it won’t we will come back and tweak our model. But first — let’s go down the easy and straight path to machine learning.
I want to point out a couple of things to get some orientation of that is happening in this script.
digits = datasets.load_digits()
First we load the data set from the MNIST set and save it onto the variable digits.
classifier = svm.SVC(gamma=0.001)
Here we are creating a SVC — so a Support Vector Classifier — which under the hood uses a Support Vector Machine for it’s work. It’s basically the skeleton of our model, which we will now train for our needs with the big set of digits.
classifier.fit(data[:n_samples // 2], digits.target[:n_samples // 2])
And after all we have to save or better to say dump the model somehow, which is done with this piece of code. The format is called Pickle and it’s a binary representation of the model iteself.
with open(‘mymodel.pkl’, ‘wb’) as file:
pickle.dump(classifier, file, protocol=pickle.HIGHEST_PROTOCOL)
Done! We now have a Pickle formatted model which should be able to detect handwritten numbers from images. A little step is missing though.
** If you are not a data scientist and know exactly what algorithm will fit best, have a look at this cheat sheet to get a first idea. The rest is then a bit trial and error and experiment what fits best on your set of data.
We need to convert it for the usage in our iOS application. Why? Because Xcode just accepts CoreML models and no Pickle formatted ones. But no worries, the conversion is quite easy. We again use Python for this task and it’s done with those few lines of code.
We are done with the machine learning part. Easy right!
All the noobs — including me — are cheering right now :)
Weeee’ve done machiiiiine learniiiiing!
The integration of our CoreML model into Xcode is quite easy. Just drag and drop it as a resource, don’t forget to copy it, and you are good to go.
But now a bit of work is lying between our model and the predictions we want to do with it. Why, you are asking? I can cut images of my own handwriting and just ask the model what’s on them. Basically true. But not with our model. Because our model doesn’t take pictures as an input, it just takes an 8x8 multiarray of Doubles. You can find out what the model accepts as an input and offeres as an output by just selecting it in Xcode and having a look at it’s properties.
What we have to do
To convert our handwritten numbers to 8x8 multiarrays we need a bit of math, but not that much. Here is what we essentially need to do:
- Get the handwritten number
- Save it on an UIImage
- Then cut the UIImage 8x8
- Look at every tiny piece
- Get it’s color info and the alpha for the tiny piece
- Multiply the resulting value by 16
- Save this value to the corresponding index in our multiarray
- Give the multiarray to the model for prediction
- And print out the result of if
So let’s look into three little details which might be interesting.
First, the retrieval of the currently written number. I’m checking each touch which is generated (see func drawStroke) during writing with the pencil and save the lowest x/y coordinate (top left) as well as the highest x/y coordinates (lower right) to know exactly what extent the latest writing has. After a “oh, the user stopped writing”-timeout (see the trackTimer in func setup) I draw the rectangle (see func drawDoodlingRect) and save exactly this area of the canvas into an UIImage (see func fetchOCRText) . And this image is the basis for the data we need to ask our model.
Second, the translation of the image into an 8x8 multiarray. The code consists exactly of the already listed steps above. All we do here is cut a piece out of our image (the number we’ve just written and cut out), have a look at it’s color, get the alpha and save it with a little modification to the multiarray. Doing this 8x8 or better to say 64 times we will have traced the whole image and can now ask the model for it’s guess on it.
Third, the model prediction. It’s just one line of code. We ask the model for an prediction and get a number back. 8x8 multiarray in, number out. And this number is, in the best case, exactly what we’ve written down seconds ago :)
A bit into testing the screen of the Scribble app then will look like this :) Getting back our own prediction from the model and printing it right on the canvas under our written number.
We’ve done it — We can recognise (most) handwritten numbers — offline!
IN THE END
… This is not the end!
Obviously. There are a couple of problems with the model.
First of all the training images are all taken from U.S. people. You guys write numbers in mini different way than we Europeans do, but this already has a significant effect on the prediction quality of the model. So the model would need additional training with handwritten numbers from e.g. Germans.
Another means of optimization would be to use another algorithm as the SVM. Maybe a neural network would fit better? This would be the next step to this first prototype.
And the last thing, but this is a restriction from CoreML itself: You can’t give feedback to the model. It’s there. It’s trained. It’s read-only or better to say prediction-only. To improve the model in this scenario we would have to save the feedback somewhere, then do a separate new train cycle and updated the model in the app afterwards — quite a struggle.
Further keep in mind that regarding with what you train the model it can get quite big. And as it is embedded in your app it will gain some data weight also.
It’s on you! Play around, build something awesome. Dive into Python and don’t be afraid. It’s not that complicated to code your scripts with it. And getting into the ML stuff seems to be very hard also. But you already managed to get this great in what you are doing. This is maybe just your next challenge you will master.
Take your time :) And take this for a good start.
PS: And I’d love to hear how you are doing :)