Reflections on HoloScribe (HoloLens Development with Unity, Google Cloud Platform, and Microsoft Cognitive Services)

9 min readDec 18, 2018

This year, I had the incredible experience of programming for the Microsoft HoloLens in a hackathon project. Working with one of these devices was like a dream come true, and so I wanted to take the chance to write up my thoughts about the device and its interfacing with Unity, Google Cloud Platform, and Microsoft Cognitive Services.

What is the HoloLens?

For those of you who haven’t heard of the HoloLens, it’s Microsoft’s primere mixed reality headset solution. According to Microsoft

Microsoft HoloLens is the first self-contained, holographic computer, enabling you to engage with your digital content and interact with holograms in the world around you.

The HoloScribe Project

In the future, XR will be a part of our daily lives, ingrained deeply into our world. A part of this future will be the ability to translate between languages at ease with a simple word or click or a button. Enter HoloScribe, a HoloLens application which allows users to translate words or phrases from one language to another with simple voice commands. Future extensions include a noun parser which will visualize the nouns so that users can also visualize what the words are describing.

— HoloScribe Description

HoloScribe Promotional Poster 1 and Concept Poster

The HoloScribe project was created in under 48 hours for the Microsoft HoloLens using Unity and Visual Studio for development in C#. The hackathon project was supported with Google Cloud Platform (GCP)’s OCR, Google Vision, and Google Translate services in the form of a REST API, also in C#. For testing of this REST API, we utilized Postman requests, while the Unity project was tested live using the HoloLens.

The original project utalized GCP’s services suite due to an additional prize category being available for the suite, but later on our team went about investigating the Microsoft Cognitive Services (MCS) solutions that were in our original plans before the category was revealed. A comparison of both of these suites will be included in this article as well.

HoloLens Pros

The HoloLens device itself has many advantages and perks to its names, many of which often go along with its current reputation. Of course, said reputation itself could be considered a pro, as the device and XR buzzword tend to draw interest right away.

In terms of the hardware itself, the device is light weight and greatly adjustable, allowing for extended comfortable use, even with glasses. With proper adjustment, it never felt too tight around your head or your nose. While demoing with headsets like the Oculus or Vive, I often found problems with the devices not fitting around people’s heads depending on their hair or accessories, but this was never a problem with the HoloLens.

This comfortable experience was, of course, heightened by the fact that the device is a standalone mobile experience, not requiring wires like the aforementioned VR headsets. For interactions, the HoloLens also focuses entirely on built in services. There was no need for controllers and all interactions were using the built in gesture recognition and voice services. Given the goal of the product being seamlessly integrated into every day society, this worked exceptionally well for us, as we could just walk around the building scanning things with voice or gesture commands. Especially with the voice commands, we never felt like it interfered with our experience or made us stand out to those around us.

The thing that mattered most to us for our project, though, was how the experience felt. This was the first time for most of our team even using the HoloLens, and the one member who had experienced the HoloLens before had only used it for a basic tutorial-esque application. The team was, instead, more extensively versed in VR field, so we wanted to see how this translated. The quality of the visuals produced by the HoloLens were fantastic, to say the last. There was never any blurred lines or jagged edges, and the transition during movement was seamless. Furthermore, the spacial awareness and tracking was superb. Other devices we’d seen couldn’t keep track of where an item was exactly, causing it to jitter or move as the user did, but the HoloLens not only managed to keep track of the location during viewing, but also after it had left the screen.

HoloLens Cons

An analysis wouldn’t be complete without viewing what went wrong during our project’s development.

The most regrettable thing about the HoloLens is that the ‘wow factor’ doesn’t impact everyone. In fact, it had almost the opposite impact for people. Not a single of the judges for the hackathon wanted to try on the device, making an immersive demonstration of our work nearly impossible when it hinged about use of the device. When we asked why they hadn’t want to, they often shrugged and said they ‘just didn’t believe in XR’. Interestingly enough, all of the other students and participants were begging for a slot to try out our project, but the more experienced visitors had little to no interest.

This, of course, leads to our first observation of weaknesses. When operating the HoloLens, there’s no way to supervise what someone else is experiencing due to its stand alone nature. VR applications usually allow you to easily view it from the computer it’s connected to, while we found this feature missing on the HoloLens. This made it, again, exceedingly difficult to properly demo it to the judges and any other passerbys, and, furthermore, give feedback on proper use to the people who did use the HoloLens.

While we were waiting for the judges to arrive, we found the device often being put to sleep every minute or so without being placed on someone’s head, likely to cool down. This made it very difficult to have a demo always ready, because the Unity player required the device to be booted when it was deployed, and the application seemed to disappear once it went to sleep like this, meaning someone had to always be wearing the device to keep it prepared. Doing this turned out to be a mistake, as only a few minutes into our demonstration, the screen for the right eye burnt out. If the device was turned off for a few minutes, it occasionally turned back on, but only for a short while before it burnt out again. Without both lenses, the device caused intense headaches and nausea for whoever was using it, even for our teammates who didn’t get any nausea while using VR applications. This nausea was also present at the start for many users as they struggled to interact with the view, which, they felt, was limited and unintuitive for interactions.

We also found the Unity support for the gesture recognition to be unintuitive, and thus weren’t able to implement it within the time frame given. This really limited our interaction options, as we originally wanted a drop down for translation options, which we instead switched to a menu bar like set up with a few basic language choices. For the mechanic of scanning words, we, instead, had to rely upon the voice recognition feature, but this feature perceives noise almost too well, especially when we were demoing in a noisy auditorium. It picked up every bit of noise in the surrounding area, meaning that the device couldn’t interpret simple commands like ‘Scan’ without them having to be nearly shouted in a clear accent. The participants we demoed to didn’t understand how clear the commands needed to be, and so it often seemed like the application simply didn’t work.

At the end, the most basic form of cons for the HoloLens was its accessibility to the general public. The device we borrowed was from a research group due to having worked with them in the past. Without this connection, we never would have had access to such a device due to its price. The device also requires fairly intensive hardware and software specs — only one of our devices could handle running it. Once we did get the machine set up, though, everything seemed to run smoothly.

Google Cloud Platform Versus Microsoft Cognitive Services on Azure:

As mentioned earlier, initially we developed the project with the GCP suite simply because there was an additional prize category for it. It worked well for our purposes and the demonstration, but afterwards we wanted to compare it to the Microsoft suite of products, as the HoloLens was, clearly, another Microsoft product.

In our demonstrations, we found the GCP solution seemed to only work for a few handwriting styles. We had the native speakers of several languages like Spanish, Chinese, Japanese, French, German, English, and Hindi write out different phrases to test out the translation, but only the German and English samples went through. We had these people re-write the samples from the other languages, which we doubted would work due to the writers not knowing these languages or symbols, but, surprisingly, it was more often read than the native speaker’s handwriting, though the translation was off for obvious reasons.

To remove the variability of handwriting, we decided to try the solution with a printed sample. We used the promo banner of the HoloScribe project as our primary sample.

Our focus for this test was on the bottom words, specifically

“Demo space moved since HoloScribe includes voice recognition! Judges, please see team member for location details!”

The result of scanning this banner, even repeatedly, was always

“Came space moved since Holo Scribe includes voice recognition! ges, please see team member for location details!”

HoloScribe Promotional Poster 2, including screenshot of the GCP translation

Of course, this scanning and (Detect Language) -> English translation was almost perfect, but it left room for improvement during the actual scanning process.

In comparison to the GCP solution, we didn’t find much of a difference in the quality of the scanning part of the service, perhaps due to this primarily relying upon the HoloLens camera and the ability of the user to take the appropriate pictures. The greatest difference we found was in the translation step instead. While Google offered more languages to translate to and from, the Microsoft offered translations were often more accurate and properly translated according to native speakers. If we were to redo this project, we would go back to our original plan of using MCS to handle the REST API.

Conclusion

Due to our time constraints, we found ourselves missing a few features that we wanted to implement. One such feature was the noun parser which would visualize the nouns so that even with possible translation errors people can still understand the intent of the entire sentance. We also would like to have implemented the gesture recognition to allow for a backup solution to scanning in case the audio wouldn’t work, which it very well could not in the ‘real world’, as our intentions were for the product.

While it was not our original focus, we also are looking to pivot the direction of our project in the future. After speaking with an accessibility expert on our project, we found an accessibility focused HoloScribe would be immensely useful. By adding the feature of text-to-speech with the translation features, blind users would be able to read and interact with the world around them in real time, regardless of the language.

While our project at the end of the day was successful to us and we, as developers, loved the experience, we, disappointingly, found that the HoloLens doesn’t seem to be ready for public consumption simply due to the public’s perception of it mixed with the overheating issue we faced on our device. In the future, if the overheating was fixed and people became more accepting XR as a whole, then the HoloLens will certainly be the future due to its seamless integration into the world around us and boundless application possibilities.