The First Holographic OCR Scanner

Developing a Text Recognition Engine on the Microsoft HoloLens

Anyline
5 min readJan 31, 2017

In October 2016, we presented our very first HoloLens prototype at the AWE in Berlin. We demonstrated how mixing OCR technology and sensory data can lead to very powerful applications in Augmented Reality!

Developing such an application however was a challenging task. The HoloLens is unlike any other smart glasses. It yields a level of augmentation that no other device has yet achieved. On the other hand, the design guidelines by Microsoft are rather sparse and there are certainly no guidelines on how OCR scanning should look and behave on this device.

If you are intrigued how we actually achieved to develop the first holographic OCR scanner and what you should be aware of, continue reading!

UX Design Considerations

In order to create an intuitive and convenient user experience, it’s important to challenge some design problems from a user experience point of view.

What you see in the picture above is what you see when looking through the HoloLens with our OCR scanner implemented. Note how that rectangle separates the region of interest from the rest of the scenery? And also how there is a holographic bubble placed in front of it, showing a capture of the frame and a text result. Keep this image in mind for the upcoming points:

The integrated webcam doesn’t see what you see

The HoloLens is a wearable device that sits on your head. This means that all the sensors and cameras are placed slightly higher than your eye height. Every person has unique head and face forms. This leads to the fact that the camera sensor most likely won’t point at the same spot as your eyes. For precise OCR scanning however, it is crucial to align the augmented region of interest with what the camera sees. Furthermore present that information to the user in a natural and precise way.

The human eyes can’t focus on multiple distances

This might sound trivial, but believe me, it’s not. Let’s refer to the transparent hologram floating in front of you as shown in the picture: Your only task is just to gaze through it. You can either focus your eyes on that hologram or on the scenery behind it. If the hologram is not placed right in front of the scenery, you will have double vision of either the hologram or the background, and it will be insufferable to look through.

Nearby fixed augmentations are annoying

They just are. The scan rectangle you see on the image above is designed to always find and position itself at the distance you gaze at. This is very similar to the standard cursor from HoloLens. The significant difference is that it always appears at the same size because the ROI of what the camera sees is independent from depth.

At first, it was designed to float in front of the user in a fixed distance where scanning would work best (around 30–40cm). It quickly became annoying because you couldn’t get it out of your way.

Incorrect Augmentation is stressful

I had to wear the HoloLens up to several hours per day because I was developing that OCR prototype. That’s why I can tell you some important things from my own experience. Make sure that the holograms you produce intertwine with the real world in a fairly natural way. If you look at a real object and a hologram is augmented “behind” that object, the visual perception makes no sense for your brain. This honestly messes with you even after you have already taken off the HoloLens.

Developing for the HoloLens

Let’s move on to the development part. I’ll give you a brief overview on what choices you have for coding such an application and how our powerful OCR technology was actually integrated.

For simple business applications, the decision of which approach to use on HoloLens is still pretty easy. Simply create an UWP App that renders its content in XAML. But what about more visually challenging applications, like 3D holograms?

Unity vs. DirectX

There are two approaches on how to create a 3D holographic app in C# for HoloLens. Either you create a Unity project and write your application logic in Unity’s C# Scripts, or you directly create a Holographic App in Visual Studio 2015 and consume the Windows Holographic API through SharpDX — a DirectX Wrapper for C#.

If you plan to create a game or a visually ambitious application, I recommend using Unity. For 3D applications, I would almost never recommend not using Unity. Having to deal with DirectX APIs is a huge pain. Only if your 3D app will consist of mostly business logic or communication, and little to nothing visual, I’d disregard Unity.

Mixing XAML and Holographic Views

As you might know, the Microsoft HoloLens runs on Windows 10. Therefore, most UWP apps can easily run within a holographic window on the HoloLens. The Anyline Windows SDK itself is designed for UWP and is integrated into UWP apps as a XAML View component. Which is very similar to integrating the Anyline Android SDK.

The easiest and fasted way to bring Anyline and HoloLens together was to create a Unity/VS Project template. It uses both a 2D XAML View and a Holographic View at the same time. The Anyline SDK itself runs within the (hidden) XAML window and all the 3D rendering happens in the Holographic View.

This way, without having to change a single line of code in our SDK, it could be integrated into the HoloLens prototype seamlessly.

This approach was chosen because there was very little time to develop a prototype before the AWE in Berlin. For future applications, the approach will be to create a Unity plugin and directly consume the Anyline OCR API from Unity’s C# scripts.

Wrapping it up

I hope that this article gave you a brief insight of some of the challenges you might face and options you’ll have as a HoloLens developer. The technology is still pretty new, but the holographic API is really easy to use and well documented.

We’re very interested to hear and learn from your experiences with development for Microsoft HoloLens! Feel free to get in touch with us any time via Facebook, Twitter or simply via hello@anyline.io! :) This post was originally published on http://blog.anyline.io

--

--

Anyline

Text Recognition for your mobile device! Stories about mobile OCR, Computer Vision and Interactive Mobile Marketing! https://www.anyline.io