Image Recognition with Owl
How can a computer take an image and answer questions like “what is in this picture? A cat, dog, or something else?”
In the last few years the field of machine learning has made tremendous progress on addressing this difficult problem. In particular, Deep Neural Network (DNN) can achieve reasonable performance on visual recognition tasks — matching or exceeding human performance in some domains.
InceptionV3 is one of Google’s latest effort to do image recognition. It is trained for the ImageNet Large Visual Recognition Challenge using the data from 2012. This is a standard task in computer vision, where models try to classify entire images into 1000 classes, like “Zebra”, “Dalmatian”, and “Dishwasher”. Compared with previous DNN models, InceptionV3 has one of the most complex networks architectures in computer vision.
So you want to do it… with OCaml?
There exist many good deep learning frameworks that can be used to do image classification, such as TensorFlow, Caffe, Torch, etc. But what if your choice of language is Functional Programming Language such as OCaml? It has long been thought that OCaml is not suitable for advanced computation tasks like machine learning. And now we have Owl.
Owl is an emerging numerical library for scientific computing and engineering. The library is developed in the OCaml language and inherits all its powerful features such as static type checking, powerful module system, and superior runtime efficiency. Owl allows you to write succinct type-safe numerical applications in functional language without sacrificing performance, significantly reduces the cost from prototype to production use.
Owl provides fully functioning Neural network module for deep learning applications. Compared to existing deep learning platforms, Owl utilise some ideal properties of OCaml language: fast, and strong static typing with type inference. Moreover, as you’ll see soon, Owl code has great expressiveness. You can construct a running image classification application with short and elegant code.
Preparation: Install Owl
First, you need to install OCaml. The most convenient way to do it is from your system package manager. However, Owl requires versions
>= 4.04.0 , which may not yet supported by your package manager. In that case, please try to compile the source. After installing OCaml, you can then install OPAM, the package manager for OCaml. Again, the recommended way is to install from source. For your reference, here is a Dockerfile for these two steps.
Installing Owl is easy. You can build from source, download with
opam , or try all the up-to-date features with Docker. Please see the installation guide for details. Note that Owl on
opamlags behind the master branch and misses many new features, so I do not guarantee code from this article can run smoothly on it.
Before compiling Owl from source, you should install some external libraries and OCaml packages. Please note that one of these packages,
eigen , also lags behind the master branch if installed from
opam . Please install this package from the source if you can.
Of course, the most convenient methods to experiment with Owl is to use Docker. All you need to do is to pull the image, start a container, and then play with it in
Let it roll!
Enough of these boring installation steps. Forget any hello-world code. Let’s do the image classification, right here, right now!
owl -run 6dfed11c521fb2cd286f2519fb88d3bf
That’s it. This one-liner is all you need to do to see a image classification example in action. Here is the output:
Top 5 Predictions:
Prediction #0 (96.20%) : giant panda, panda, panda bear, coon bear, Ailuropoda melanoleuca
Prediction #1 (0.12%) : lesser panda, red panda, panda, bear cat, cat bear, Ailurus fulgens
Prediction #2 (0.06%) : space shuttle
Prediction #3 (0.04%) : soccer ball
Prediction #4 (0.03%) : indri, indris, Indri indri, Indri brevicaudatus
That is the classification result of this image:
The code and images used in this example are included in this Gist. Let’s check this simple example:
You need 5 steps to do image classification with InceptionV3:
- Import external code/libraries using
#Zoo “gist-id”enables you to use code modules defined in other Gists. Here we want to use LoadImage and InceptionV3 modules. The former is for reading input image, and the latter defines the InceptionV3 network architecture and loads weights of the network. The downloaded code are cached in
- Load InceptionV3 model with one line of code
- Load your image into Owl as N-dimensional array. Note that due to the limitation of image process of OCaml, we current only support the
.ppmformat for input image, and this image has to be of size
299x299. You can use the tool ImageMagick to convert your image:
convert -resize 299x299\! input.png input.ppm
4. Run inference with the neural network model and the input image, and then decode the result, getting top-N (N defaults to 5) predictions in human-readable format. The output is an array of tuple, each tuple consists of a string for classified type description, and a float number ranging from 0 to 100 to represent the percentage probability of the input image actually being this type.
5. If you want, you can pretty-print the result on your screen.
Of course, if you don’t want to use the
owl command, you can always copy the example code from this gist and run it in your preferred way, such as
If you are not interested in installing anything — No problem! Here is a web-based demo of this image classification application powered by Owl. Please feel free to play with it! And the server won’t store your image. Actually, if you are so keen to protect your personal data privacy — which I cannot agree with you more — then you definitely should try to pull the code here and fast build a local image processing service without worrying your images being seen by anybody else!
Want to Know More?
We do have more! I suggest you to read the code that constructing the whole InceptionV3 network from this gist. Even if you are not quite familiar with Owl or OCaml, it must still be quite surprising to see the network that contains 313 neuron nodes can be constructed using only ~150 lines of code. And we are talking about one of the most complex neural networks for computer vision. As to other smaller tasks, such as the most common hand-written digits recognition task, you can construct a good deep neural network model with only 9 lines of code! Please check the Owl source code to get more examples if you are interested.
The best starting point to learn to use Owl is from the tutorials, with
utop at hand, to learning by doing.