Foresights into Apple glassOS

Jurriaan Mous
Mac O’Clock
Published in
12 min readAug 5, 2020
Glasses on top of a magic trackpad

Apple launched multiple new form factors over the recent decades: the Apple TV, iPhone, iPad, Apple Watch, AirPods, and rumored next: an AR headset that overlays UI over your surroundings. Apple is not alone at this, also Microsoft, Google, Samsung, and Huawei are working on AR. But let’s focus on Apple for now.

What hints can we already see for these new glasses in the current software development kits? And what will you able to do with the glasses?

AR needs a new UI 🎛

All current Apple devices are either based on a 2D screen with a touch/mouse UI or voice/audio based on Siri. To make it easy for themselves, Apple could just show some floating 2D information in front of your eyes.

But they will likely go further than that. Real augmented reality needs to intermix with our surroundings. And for that, it is needed to rethink the entire UI. The device needs to understand the 3D environment to show information at a sensible location. And you need a way to interact with the UI which is located in the environment around you.

This shift of UI into the 3D environment is probably the most difficult UI shift yet. Bigger than going from text-based UI to the mono-color 2D graphical user interface in the Xerox Star/Lisa/Macintosh. And bigger than the shift to touch-based user interface like in the original iPhone. Apple needs to go to the next level in hardware, software, and developer tools to be able to process the environment and to present UI spatially. And as is publicly visible in new software development kits, they are fully aware and already working on full environmental AR for multiple years.

Glasses need Apps 🛠

Another reason to release the software frameworks early is to allow developers to create first AR content. This way developers can learn the trades as the tools mature, and Apple has apps to highlight the future hardware. As the frameworks mature these developers were often highlighted on the WWDC stage to showcase AR framework enhancements. And to have the frameworks out there creates the opportunity for new talent to be born. Apple needs them to jump-start an app store for this new form factor.

Augmenting the visual world ✨

Apple is working on a powerful software framework called ARKit to develop augmented reality experiences. It enables iOS devices with recent chips to track the 3D environment with the camera, CPU/GPU, motion sensors, and if present with lidar. On top of this, they are building RealityKit, their own “engine” to place objects and interact with them in the visual world. And to make it easier to develop experiences Apple is working on Reality Composer, an app to compose environments with 3D objects, audio, triggers, and animations. It is worth to check out a quick demo video or an extended demo to see what the app is capable of. Let’s look deeper into ARKit and how it evolved throughout the years to be capable of richer AR experiences.

Check out the links below to see the frameworks in action.

ARKit 🧰

Apple first introduced ARKit with iOS 11 on June 5th, 2017. It started simple by enabling developers to place 3D objects on top of a 2D plane and it takes care of the lighting effects. Over the years Apple added functions like 2D image and 3D object tracking, occlusion of objects by people, motion capturing of human bodies, tracking multiple faces, collaborative AR sessions, and use of multiple cameras. Especially in the latest versions, Apple made big jumps with Location Anchors and Scene geometry.

Location Anchors allows developers to place content at a precise location in the real world with visual localization by the camera. It uses the GPS and gets a more accurate position by using a 3D environment map of locations Apple mapped. This enables new experiences which can be exactly tailored to a specific location. It is at the start only available in select US cities like San Francisco Bay area, New York, Miami, Los Angeles and Chicago with more cities coming before the release of iOS 14.

Scene geometry interprets the environment and makes a 3D mesh out of it so 3D content can interact with it. For example, in the Reality Kit talk, they show a bug that walks over a tree and later explodes with pieces falling with realistic pieces over real-world stairs. This functionality enables 3D content to interact with your surroundings. Another essential building block for a good AR experience.

ARKit is mostly powered by integrating hardware and software in a way that only a company like Apple can do. The framework is designed from the chip up. To interpret the real world Apple extensively uses the Neural Engine, a chip to accelerate machine learning which is needed to interpret the surroundings. Adding to that, they use factory calibrated cameras, gyroscopes, accelerometers, and more image processing engines in the Apple Silicon chips. And on the 2020 iPad Pro and likely future iPhones, Apple also includes a lidar sensor to help in interpreting the surroundings.

All these technologies already enable great AR experiences

The sounds of an augmented world 🔊

We expect sound in our surroundings to come from a real object. If a cat meows, we can hear where the cat approximately is. To make an AR experience realistic we need to hear that the sound comes from the right direction even while we move around our environment.

Apple has solved these issues with the AirPods Pro. They already include a Transparency mode to let through sounds of your surroundings so current reality can be augmented. And at WWDC 2020 they introduced Spatial Audio. The AirPods will track your head movements and place sound at a 3D location.

Interacting with the AR world 🤹

A good UI needs intuitive interaction methods. This was solved on the Mac with a mouse and keyboard, and on the iPhone with the finger and touch. But how do you interact with an enriched reality? Especially if it is the world around you? Maybe multiple methods can be combined?

Siri and Voice Control 🗣

Siri can already resolve many information queries. Ask for a route and the Glasses could show arrows for directions. Ask to compose a message and the glasses show a message while you talk. With recent offline Siri support empowered with the Neural Engine, this should be quicker and more fluent than before. Siri could be triggered with a Hey Siri keyword or with a tap on the glasses similar to the AirPods.

But this is not all. Lately, Apple seemed to have invested a lot into Voice Control. This enables users to control macOS completely with their voice. Imagine you can interact with virtual elements around you with similar voice commands.

Eye-tracking 👁

Point some sensors at the eyes and the glasses could know exactly what you look at. And with a forced blink, you could select UI options. Eye sensors could also help in selecting which content needs to be rendered and which surrounding images need to be resolved in higher detail. With awareness detection in Face ID, Apple software is already capable to detect if the eyes are open and where they look at.

But eye-tracking is essential to position virtual content at the right place in the real world. When the pupil moves, real-world objects move relative to the glass. Or if maybe the image is directly projected on the Iris, the device also needs to know where the eye is.

Hand pose tracking 👋

The most natural way for people to interact with the physical world is with our hands. Apple surely wants to add hand detection so users can point to objects or grab them. It seems the software needed for this is in development.

iOS 14 introduces hand pose tracking. It enables the device to know the position of each finger and it enables the developer to trigger actions based on finger position. In one example the presenter writes a text mid-air triggered by bringing his thumb and index finger together.

How can these glasses detect the hands where-ever they are? Apple certainly needs cameras for an AR experience but will it have a field of view to always see the hands? Maybe a future accessory can help to track the hands.

Exact positional awareness of objects 📍

iOS is already capable of recognizing and tracking of 2D and 3D visual objects with the camera. But with the iPhone 11, Apple introduces an ultra-wideband chip capable of tracking the position and orientation of objects with tracking chips. It is rumored Apple will release AirTags that can be attached to real objects to enable devices to track them.

With the exact position of these objects, it is possible to overlay them with an augmented UI. It would also enable developers to use the physical objects as interactive elements, in which their orientation and position could be controlling elements in an app. For example, one object could serve as the base of a virtual object you could reposition or another as a rotatable knob to interact with an element. Or objects could be game pieces of an enriched board game.

With iOS 14 it is possible to develop for the ultra-wideband chip and Apple shows a nice demo of the capabilities in the talk introducing the framework.

What tasks are great for the AR Glasses? 💡

For the glasses to be useful additions to our lives the glasses need to do tasks in ways that no other current device can do. It seems that current devices take your attention away from your surroundings because you need to look at a screen. AR could allow you to visually enhance your surroundings while keeping your attention at your task. Can we imagine some tasks which require to have full attention in the real world while you need extra information?

Finding new routes and explore places 🗺

One key task for AR glasses is to help you navigate. While navigating you don’t want to be distracted and you want full attention to move around. What if when you start a route in a map you see the lines appear as your path? With the precise location determination in the latest ARKit, this should be an easy task. And what if this experience can be richer? Travel apps could deliver you tours with overlays and voiceovers giving rich explanations on site. It should then be no surprise that maps in iOS 14 includes a new feature called Guides enabling these kinds of experiences.

The guide experience could be enriched with 3D animations of how a location looked in the past. Imagine visiting the Forum Romanum and Colosseum in Rome and see how it looked in the past while you are there.

Suddenly it makes a lot of sense why Apple invests so much in their maps. This is needed for the earlier mentioned Location Anchors and is the basic building block for rich location-based experiences.

Quick overview of relevant information 📊

With iOS 14 Apple introduces Widgets. These are cards which contain information of an app in a quick overview in small, medium, or large size. They can be added to the Today view or on the home screen. There are also smart stacks that show widgets based on what is most relevant for that moment. These widgets are also very useful in an AR interface. This way you can add information you want to keep track of in an overlay while walking around.

Interact with real-world objects with App clips and HomeKit 🏠

iOS 14 also introduces App Clips as mini-apps which can be triggered by a new kind of QR code, NFC chips, App banners, and also by location markers in maps. So with a location marker, a restaurant could have a floating icon to indicate it has an instant start app to reserve a location or to view the menu. But also a rideshare scooter could have a QR code to instantly open a UI to book it immediately. Or imagine a washing machine to instantly show a more friendly UI to interact with it. This will soon be enabled on an iPhone, and with future glasses, you don’t even need to grab your phone.

Also, HomeKit could benefit a lot with an AR interface. Just imagine looking at a lamp and seeing its controls. Or you could open and close a garage by looking at the door. And maybe see a security/baby cam in the corner of your eyes.

The environment can easily be enriched by instant UIs to interact with our home and devices.

Instant labeling 🏷

Imagine Ikea brings out an AR app that can live interpret all the pieces of a couch. You just lay out all the screws and parts and the app highlights which part you need next and how to combine them. With 3D object detection, this should be much easier to develop. This labeling of parts and pieces is useful in any construction or maintenance task. Imagine you look into the engine of a car and you see all the parts labeled and highlighted.

This instant labeling of objects could also be very useful in education. Imagine walking around nature and being able to instantly determine any species of plants and animals. Apps like Seek could be a must-have feature for any curious hiker.

Need an interpreter? ↕️

When traveling you encounter people who don’t always speak your language. It is very distracting to handle a phone while talking to other people. You could have a live translating app in front of your eyes which does not interfere with your speech. With the help of on-device Siri listening and a good translation library, a device could translate instantly. It should be of no surprise that Apple introduces a translation app on iOS 14.

Collaborative thinking 🧠

With multiple people in the same location wearing AR glasses it suddenly becomes easy to work together on designing physical objects or buildings. Objects could be live annotated and manipulated. No more scale models, now anybody with those AR glasses can even see 3D buildings at the spot where they will be created.

Gaming 🕹

AR can be used to create rich games that interact with your surroundings. AR is not yet a hit with the current games. Maybe peeking through a small window of a device into this alternate reality is too cumbersome. The device distracts from the game. What if you see the enriched environment immediately in front of your eyes while having your hands free?

Games that take place in our surroundings benefit from being shared with friends and family. Contrary to VR, AR allows you to see other people. A good AR game with the proper hardware can become the fun shared experiences as boardgames are currently. Apple already enables collaborative AR sessions in their frameworks. I will not be surprised when augmented board games find a new audience.

The hardware 👓

For all this software to work well in an AR environment, we need great hardware. The glasses need a design that people are willing to wear and look good wearing it. They should work with prescription glasses and also have an acceptable price. And they should be powerful enough to run ARKit and have enough cameras/sensors to interpret the real world. And the glasses should be able to show a good visual rendering overlaid over the real world to make a believable experience.

How can you fit a chip powerful enough and batteries large enough to empower immersive AR experience in glasses which you want to keep as light as possible? With current technologies, this would likely be a too complicated task.

But do you need a full AR interface for the entire day? Very likely there will be a light interface in which the glasses show widgets with the basic information you want to keep track of. And when you need to navigate, play a game or do a collaborative design session, you can start a more demanding full AR mode. This mode is likely wirelessly running on the more powerful hardware like an iPhone. This delegating of processing is something Apple already did with the original Apple Watch.

Privacy first? 🕵️‍♀️

For the glasses to work they need sensors to interpret the real world. The most accurate sensors currently used in ARKit are the cameras supported by lidar. The lidar sensor by itself is pretty imprecise so the cameras are essential. How can Apple sell AR glasses which can be worn constantly without making other people feel that they are recorded? Is Apple possibly handicapping the cameras in the glasses so they can be used for interpreting but not monitoring or maybe Apple has another ace up their sleeve? We learned from Google Glass that many people do not like cameras pointed at them. Or maybe there is a physical mechanism to uncover sensors when the user goes into full AR mode?

The AR future is in sight 🔮

Looking at the software frameworks, it seems that many of the pieces are here to create rich experiences. Apple took years to be at this stage for software but can they solve the hardware puzzle into making this AR device the next hit? Is it already possible to create good-looking AR glasses that people want to wear a full day? Will Apple find an acceptable solution for the cameras? There are already some rumors they are working hard on the hardware. We will likely know within the next few years!

--

--

Jurriaan Mous
Mac O’Clock

Product Manager & Mobile Developer. Enthusiastic about the forward progress of technology.