Augmenting Reality with the Overlay of AI

Sam Bobo
Speaking Artificially
7 min readSep 29, 2023
Imagined by Bing Image Creator powered by DALL-E

The year was 2014. I was a senior in University currently enrolled in a Digital Innovation class. My professor — a serial innovator who employed a unique spin on project-based learning. Our task — within the span of a single semester, design three innovative projects using digital technology. The three projects were metal ranked — gold, silver, bronze, whereby the completeness of each project was working Proof of Concept (“POC”), clickthrough demo, and concept overview respectively. Understanding that this was a business school course filled with students who primarily thought in excel formulas over python code, it was certainly a daunting task. Simultaneously, I was enrolled in a forecasting class where we started learning about neural networks, prior to any explosion in Artificial Intelligence… look at the world now!

My concept — fusing together the augmented reality-powered Google Glasses with the software functionality of Google Translate. Note: This was a silver-level project so the concept manifested in a clickthrough demonstration. My concept entailed a user wearing Google Glasses at the top of a ski slope looking out into horizon. The wearer utters the phrase “Hey Google, quiz me in Spanish.” Instantly, rectangular bubbles with question marks overlay on top of ordinary concepts such as “snow,” “tree,” “mountain,” “skis,” and so on. The user points to an object within his/her view and utters “nieva” (in Spanish) to guess the Spanish term, getting instant feedback on whether the uttered word was correct or incorrect. Effectively, this was an early concept using machine translation and augmented reality to build a language learning application.

Now certainly, that was a “command and control” type of application in today’s modern AI era and followed the aforementioned use case for AI (read the below piece to learn about others)… and this was 2014!! Fast forward 9 years to the present, 2023, and, like warp speed, the sheer compute power, available data, and efficiency of algorithms have given birth to large language models and generative AI.

What prompted this piece? Technology analysis such as Ben Thompson have been writing about the fusion of AI with hardware as a game-changer. In this moment, we have Jony Ive and Sam Altman collaborating on hardware to introduce with ChatGPT, Mark Zuckerburg and Meta are discussing smart glasses embedding AI. That is simply the start.

Virtual Reality or Augmented Reality?

Within the technology realm, there has always existed a debate on whether AR or VR were the dominant play. Augmented Reality overlays graphics and text on real-world plane of sight, in actual reality. Conversely, Virtual Reality obstructs the wearers entire field of vision and seeks to immerse them into an entirely new universe. Introduce the concept of Mixed Reality whereby the wearer can toggle between both modes seamlessly. The latter has always caused confusion in the market until the term “mixed reality” was coined.

Virtual Reality has always been rooted in video games, education, and the “metaverse” as full immersion experiences. One can exercise and hit moving targets as a form of videogame, can instantly teleport back in time and across lands to learn about historical lands (such as that done at Ubisoft), and can conduct virtual meetings with a feeling of togetherness respectively. Aside from the hardware (battery, GPUs, wired or wireless, etc), the largest obstacle to VR is the refresh rate of the screen and making sure the wearer does not get motion sickness. Personally, the aforementioned use cases are prime for VR which is why I understand Zuckerberg’s play for the metaverse, after all, who wouldn’t want to virtually get together with friends and play in a fictitious world. I digress.

Augmented Reality, on the other hand, has always rooted itself in use cases pertaining to safety. AR glasses can help give workers in dangerous situations (confined spaces, hazardous environments, etc) the ability to view vital information pertinent to their job while freeing up hands to be safe and conduct the work around them. The largest obstacle for AR has typically been the general glasses hardware and adoption of such, even more so the fascination by VR.

Personally, I have long been an AR advocate and preference AR over VR, most notably because of the information overlay. Hence the second part of this piece. Artificial Intelligence is about augmenting human intelligence. AI systems predict, command-and-control, perform Q&A, automate self-service, and much more — all underpinned by transformation-based services such as speech-to-text, text-to-speech, natural language understanding, text-to-video, text-to-image, and much more. Artificial Intelligence is prime for Augmented Reality in the way that it “augments” our “reality” and way we interact with the world. To the opener of this blog, simply the ability to translate text around you and learn on the fly is a compelling use case to learn, travel, and much more!

Barriers to AI — AR adoption

There are three obstacles to the realization of AI in Augmented Reality hardware: (1) The interaction mechanism (2) The “prmopts” and (3) mainstream adoption.

Interaction Mechanism

Yes, naturally humans converse and the personification of AI has biased us to start with that use case initially. Iterative design via LLMs are a “conversation” to prompt the system to make modifications. Its “natural” but we can do better! Simply put, conversing or “command and control(ing)” of a system would be a limitation as humans will be walking around commanding an AI and not convering with one another. In other words, we would be shifting the societal plague of walking around with our heads down staring at our iPhone and instead, sounding like crazy people commanding our own personal assistant all the time.

I call for ambient computing where computer vision constantly utilizes the user’s surroundings as context to power the request or prompt (more on that next). Ambient computing would remove the constant need to command and control and shift that focus towards delivering information at the right time.

Additionally, our glasses should not be speaking to us and projecting sound constantly, the world would be too noisy. Instead, we need an additional level of hardware such as headphones that would help deliver the relevant information directly into our ears.

The Prompts

Artificial Intelligence requires immense training to achieve specific use cases. Recall that there are three layers: (1) The base/foundation layer that contains the language modeling and base knowledge (2) Domain specific information that specifies the model on a particular domain and (3) the proprietary and personalization models that attune the model unique to the situation of the person or company.

Prompt Engineers have evolved as a career path to help use transfer learning, retrieval augmented generation (RAG), and other mechanisms to attune LLMs to specific use cases. While there is a strive for Artificial General Intelligence (AGI), society is certainly far away from that and, AI can not be specialists at EVERYTHING.

Secondly, most LLMs today require contextual tokens to guide the LLM. For example, instructive prompting gives the LLM a role (health practitioner, teacher, etc), situation, expected input, and expected output. Zero/One/Few shot learning requires examples, etc. This level of prompting is not known to society at large using LLMs. In order to power AR scenarios, users can not be required to set context.

Mainstream Adoption

(x) ads for monetization (x) trust (x) privacy, etc are simply a few debated topics in the AI realm hindering its adoption and development. All else equal (i.e the aforementioned obstacles) another hurdle will be price. Augmented reality requires new hardware, and that typically comes at a price. With the addition of new hardware comes the instant limitation of the total serviceable market simply based on means. I chuckled at Google Box where they effectively took a cardboard box and an Android phone to make VR but it worked.

Next, if my hypothesis is true, than this AR system would be required to travel on your person constantly to overcome being a novelty or a technological fad.

The Case for Apple

Google (Bard), Microsoft (OpenAI), Amazon (Claud) all show the major hyperscaler companies’ entry, either homegrown or partners, into LLMs. Apple, conversely, has not entered LLMs but are meticulously and strategically working on its integration. Apple has two competitive advantages in the AI-AR market: (1) Apple seamlessly fuses hardware and software together to make compelling individual products and a larger ecosystem of integrated products and (2) an iPhone is a dominant phone in the market which travels on one’s person constantly. Yes I do agree that Vision Pro might take a long time to adopt, but that is VR, not AR. Should apple be able to harness its hardware prowice for AR (including AirPods for audio) combined with either home-grown or partnered AI capabilities, could make a major play in this emerging space.

In summary, I make the following points:

(1) Augmented Reality is the prime market for infusing AI into hardware with incredible everyday benefits

(2) There are a number of obstacles to achieve mainstream adoption

(3) Apple (+ a partner) is prime to capitalize on this

Thank you for reading! Let me know your thoughts!

--

--

Sam Bobo
Speaking Artificially

Product Manager of Artificial Intelligence, Conversational AI, and Enterprise Transformation | Former IBM Watson | https://www.linkedin.com/in/sambobo/