Vision Pro: Tech’s next revolution or just another metaverse gimmick?

14 min readJun 7, 2023

Apple made good on an unstated promise it made as early as 2015 (well, not publicly anyway).

The Vision Pro, which the Cupertino-based company is clearly betting on as its future in consumer-facing technology, was announced at WWDC on June 5.

Of course, the media ate it up — It was the biggest announcement from the biggest software event of the year, which was hosted by the most successful company in the history of humanity.

Tim Cook, Apple’s shepherd for the last 13 years, called the Vision Pro “revolutionary”. And his flock ate it up. Also expected.

But, is it true? Is Apple’s biggest ‘one more thing’ announcement of the last decade really revolutionary?

Well, Apple Clearly thinks So.

And to be fair to them, we haven’t had a company read, and dictate, the era of personal computing as well as Apple has over its near 50-year existence.

Apple has placed bets and called game so many times that it’s easy to believe that the company’s decision making seat might actually be occupied by a soothsayer holding a crystal ball.

In the 70s, Apple’s role in technology, especially for consumers, was revolutionary. In fact, what we now call PCs would probably not exist in the way they do now if it wasn’t for the Jobs-Wozniak pair. They were not exactly the pioneers of GUI or displays (or arguably anything ever for that matter), but really that isn’t what revolutionary means.

Revolutionary, according to this big, boring book of random words called a dictionary, means Involving or causing a complete or dramatic change.

And if there ever was a product that caused a complete or dramatic change in the early PC days of the 80s, it was the Apple Macintosh, which was one of the successors of a bestselling Apple II (which was inarguably the face of personal computing in the late 70s).

Apple co-founder Steve Jobs with the beloved Apple Macintosh in 1984. The Macintosh didn’t necessarily have the best software, but was clearly very much ahead of its time.

After a quiet 90s era (not for lack of trying; they released the most products in their history in the 17 years between 1984 and 2001. Most were unsuccessful), Apple roared back to become the cheerful face of computers and consumer technology with the iMac (2001). And from then on?

2004: the iPod. 2006: MacBook Air. 2008: iPhone. 2010: iPad.

Bang. Bang. Bang. Bang.

For the remainder of the 2010s, they shifted their hardware focus to sustaining their lead by wading into the wearables market.

2015: Apple Watch. 2016: AirPods.

Bang. Bang.

Yes, there have been failures (HomePod, trashcan Mac, we see you cylindrical disappointments), but now, barely three years into the 2020s, Apple clearly believe they can do it again.

First, some terminology explanations

Virtual Reality (VR): A simulated experience that uses 3D near-eye displays to give users an immersive feel of a virtual world.

Augmented Reality (AR): A technology that superimposes a computer-generated image on a user’s view of the real world, thus providing a composite view.

Mixed Reality: A combination of both VR and AR.

Extended Reality: The umbrella term that covers virtual reality (VR), augmented reality (AR), and mixed reality (MR).

So basically if I was playing a video game where wearing a pair of glasses completely transported me into a virtual “world of zombies”, that would be VR. If all wearing the goggles did was bring a zombie into my living room (and I was still spatially aware of all my furniture and appliances intact around me), that would be AR. And if the game was a combination of both experiences (say I can transform my living room into the zombie world and at the same time use all my tables and chairs as surfaces or props), that would be MR. And well, extended reality is the field that all these jargons fall under.

Like I said earlier, Apple is making a huge bet on extended reality featuring heavily in the next iteration of consumer technology. And to do that, they focused on creating a pair of MR-enabled goggles — the combination of VR and AR.

Why did they need to do that?

Apple’s 8-year-long wager

Apple’s Vision Pro just got announced but they started this journey a long time ago. As early as 2016, the iPhone makers already had one building block that they had been working on for quite a bit: ARKit.

ARKit a software development kit for developers that was first introduced with iOS 11 at WWDC 2017. ARKit, as the name implied, was for helping developers build Augmented Reality-enabled features into their apps. Over time, advancements to computer vision, depth-sensing cameras and object recognition techniques were employed in building out this technology to reality.

But the limits of ARKit on hand-held devices became apparent very quickly.

Anybody who used ARKit at this time could quickly notice the severe limitation of the technology. In some cases, it was even impractical and gimmicky at best. For instance, the Measure application, which Apple had included shortly after in their OS release, didn’t work quite well. Measure was supposed to tell you the accurate dimensions of real-world objects just by placing your camera in front of it. Eventually, some of the most consistent, practical use cases for what developers had created using ARKit were for testing out products in your space (for instance, using IKEA’s app to check out how 3D versions of furniture would look in your living room) which is cool, but really not that groundbreaking.

Pictured above is one of the most advanced updates to the IKEA AR-enabled experience for checking out furniture in your space. Great, but not particularly groundbreaking.

If you ever tried out the early AR experiences of Snapchat’s AR Bitmoji features or any other apps that utilised ARKit, you would recognise how much the devices were clearly struggling with spatial awareness. ARKit’s early rollout made two things apparent.

1. AR technology just wasn’t where it needed to be yet — and not necessarily on the software side.

2. a wearable device was the only way to move forward with creating a truly immersive experience using AR.

These two things were dependent on one important, niggling problem: hardware.

From ARKit to Vision Pro

Noah was stuck in his Ark for what, a year?

Well, by 2022 Apple had stayed in the damp, mouldy, underwhelming vessel of public disregard called ARKit for five long, impatient years.

Of course, they were working on their cruise liner the whole time. But that comes later.

One important thing to note about AR and VR technology is that it takes a lot of computation to try to convince a human person that something that isn’t there actually is. And it takes a lot more than software to actually pull it off. Especially considering the times — remember, this was 2016–2018. A lot of the innovative sensors and processors that makes the Vision Pro possible now didn’t even exist back then. The potential for extended reality in everyday consumer products has always been notably high, but the sheer amount of technological miracles it required made the concept incredibly hard to be practical.

For instance, let’s say you want to create a VR/AR solution that can essentially replace Microsoft Word, but in a virtual environment. So, no physical desktop or laptop; just wear a pair of goggles and voila, MS Word is right in front of you.

This means many things.

Firstly, there’s no actual “computer” to run this Microsoft word, since it is purely a virtual application in a virtual environment. This means whatever pair of goggles you are using either has to be connected to a computer (in which case connotes a full-circle moment, like creating wired AirPods), or the device has to be the computer itself.

Next, concerning how you would type input into this “virtual Microsoft Word” application, you would normally need an input device of some sort (like a keyboard or touch screen interface) to do so.

But remember, you are not in front of anything. Ideally, your product’s selling point is that for one to type a document, they don’t need a phone, laptop or anything. Just a pair of goggles and some control gear at most (like the Meta Quest handheld remote). But like I said, the less gear, the more practical, so not needing a handheld remote at all would be nice too.

So now, you need a virtual input device, and you also need a virtual “screen” for the Microsoft Word window to be displayed on. These two virtual components — the screen and the keyboard — need to be placed on something, ideally a flat surface like a table or desk. Either that or they just float in front of you, but in a realistic way.

To achieve that, you need more intelligent spatial awareness. Basically, depth sensors. And not regular, depth-sensing cameras that have been coming with most flagship phones since 2018. You need what is called a Light Detection and Ranging scanner. LiDAR for short. This very complicated, very expensive technology is predominantly found in surveillance drones and satellites.

Yeah. Satellites.

To keep things short, LiDAR basically works by using a laser to emit light unto an object or a surface and then measure the time it takes the reflected light to return to the receiver. This helps create very accurate, high-resolution 3-D representations of the environment, which depict real world dimensions of distance, length, breath, and even depth dimensions like height and volume. Keep in mind, this kind of technology, at the time, was not only worth tens of thousands of dollars, but was also not miniature enough to fit into a wearable or holdable consumer product like a pair of simple goggles or a phone (well, to be fair, Samsung could pull it off already at the time. Just saying).

Now let’s say you’ve gotten all the software and hardware you need to create this virtual Microsoft Word application. Here comes the real kicker: How do you make an application that is not actually in front of you look believable and real when its being used?

Well, to do that, you need to create a pair of goggles with a viewing quality so stellar that it could pass as a set of eyeballs. It also needs to be bright enough to actually trick your eyes into believing that there is an actual floating Microsoft Office window in front of you in a realistic environment, and that a keyboard, which is entirely virtual, is actually there for you to type on too. For context, the kind of technology that would be required to pull this off, can only be found in 4K, gaming monitor-quality screens, which at the time cost thousands of dollars. In fact, such technology had not yet been successfully deployed to PCs and phone screens yet. But let’s say you managed to pull this off regardless.

If you did, then you would need a power source to actually run all this heavy, complex hardware without being uncomfortable or a health hazard for the user (remember, you’re creating a product that people are wearing on their faces, meaning both the product and the battery cannot run too hot, seeing as they are probably going to be on the device itself or at least very close by).

And then finally, if you managed to do all that perfectly well, we need to then talk about the chip that has to process all the hundreds of billions of mathematical computations and sensor data per second in a very energy-efficient way. Because you have to run all this complicated software on a device that should be able to pass as a pair of skiing goggles.

When you finally do all that — finally find a way to get all this expensive hardware and software into this pair of glasses — well, you now have to find a way to make it affordable enough to actually be bought by somebody that is NOT operating with the financial backing of a trillion dollar company.

So, let’s say an individual or a company were to pull this off, would that be considered revolutionary?

Well, you tell me.

Staggering performance, Staggering hardware

It seems Apple pulled it off. We won’t actually get to know for sure till next year (2025 for some), but by all early indications, they managed to pull it off.

They built a device that can fit on your head fairly comfortably. And that device makes use of ARM processors that are more capable than most of the processors on the market for personal computers today (very few people were even considering these chips for such powerful devices back in 2019). For context, Apple’s Vision Pro is using the same processor as the most powerful 16-inch laptop currently available on the market. And it sports another proprietary chip, the R1, which handles all the real-time processing from the sensors.

What sensors?

Yeah, remember those sensors that probably wouldn’t have scaled down so well a few years ago? Well, Apple (and many other companies) have collaboratively evolved sensors like LiDAR, IR receivers and other infrared sensors and so on to be small and lightweight enough to be included in small devices like phones and tablets. Apple themselves have now shipped LiDAR on iPads and iPhones for some time now.

This Mixed Reality device that apple built also has a 4K Micro OLED display for each eye, with a pixel density of 4000 ppi. For context, the iPhone 14 has 460 ppi. The iPhone 14 can also only operate at sustained brightness for ≤ 2000 nits. Apple’s Vision Pro? Well, according to reports, it can go up to 5000 nits of brightness.

Oh and remember when we were talking about input devices?

Just like our ideal use case, Apple didn’t opt for a handheld remote. Instead, the input devices are quite literally your hands and your eyes — the “selection indicator” tracks your eye movement and you can use swiping and pinching gestures with your hand to control and click on the UI.

This is absurd.

Speech-to-text is also available, but there’s a virtual keyboard feature, although I doubt it would be that great to enjoy because there’s no tactile feedback. But according to early reviewers, the hand gestures and eye tracking are light years ahead of the competition.

Here, the user is using pinch and swipe gestures to control her Vision Pro goggles. The movement is so subtle, but the powerful censors and cameras (there’s no less than six cameras on this thing) pick up on her movements accurately.

What about a power source? Well, they did pull it off, but not in particularly stellar fashion, which is forgivable seeing as the impracticality of anything more ambitious was apparent to anybody who understands elementary physics.

The Vision Pro can be connected directly to a power source. But for mobility, it also has an external battery pack, which can last for 2 hours (Not a lot but you probably shouldn’t be using these for longer sessions right now anyway), and that power source can be placed in your pocket and connected by a magnetic puck to your Vision Pro device.

In true Apple fashion, this device plays nice with all your other ecosystem devices. So despite the fact that the Vision Pro is a computer all on its own, capable of running apps and connecting to the internet, you can still connect it to your Mac for that full circle moment when you need to use heavier software. So, say you want a huge 32-inch, 4K display to show up out of nowhere as an external display for your Mac when you’re editing a video— yeah, the Vision Pro would (theoretically) do that too, serving as an external virtual display for your Mac.

What using the Vision Pro for work would look like. Remember, all these windows are running directly from the Vision Pro, not an external computer. However, you can also use the Vision Pro as a virtual external display for your Mac, because, well, ecosystem flex.

It also has a few neat, expensive tricks, like a vivid passthrough mode which renders your living space around you with such great detail that you might actually believe the glasses are transparent, see-through lenses (they are not). It also has a neat little feature that lets people see your eyes from the outside, which, again, is just an illusion. An outward facing display projects a realistic render of what your eyes look like based on sensor data.

“The eyes, chico. They never lie.” Well, these ones do. Because those aren’t “real eyes”.

So, sure, it isn’t perfect (early testers are saying it’s a bit heavy on the head), but it definitely seems like a miracle.

How did Apple pull it off?

Well, the short answer is: With their $20-billion annual R&D budget (which is larger than the yearly national budgets of no less than 152 countries).

Though the short answer might not be the most impressive one, it is fairly sufficient. But let’s just say that Apple needed to find a way to scale down industry-grade technology, both in cost and in size, until they were small (and “affordable”) enough, to be sold as consumer products to its existing base.

I put affordable in quotes because the Vision Pro comes in at a whopping $3,500, which is more expensive than a good 90% of the consumer products that Apple currently sells. But, well, now you know why.

Concerning the price, while we may never really be able to say that this product will not alienate the majority of the company’s base market, we do know that Apple has never quite had the reputation of egalitarian pricing for wide markets.

But they kicked off efforts to penetrate developing markets in the late 2010s, so we can expect them to continue that trend in a few years by releasing a “budget-friendly” version of the technology that can be more adoptable for the wider market. To be fair though, that budget version might still be a couple thousand dollars more than the average person would be willing to pay for yet another hardware product.

What the penniless can be happy about

There are some things to consider that could make us average tech users excited.

One: this is WWDC, where some of the absolute best and brightest developers of our generation are currently familiarising themselves with the product and its use-cases in very practical terms. These folks will then spend the next few months creating and perfecting software that, for all we know, will provide the vertical use cases for this sort of technology that will probably scale down well in terms of pricing due to their narrow focus. Basically, expect a Visual Pro “knockoff” that is specifically for work and enterprise (Microsoft is working on one too), or one specifically for education purposes, or gaming, and so on.

And something else to be happy about? Well, this is the first one.

The first iPhone had a max storage of 16GB, 320 x 480 pixels screen, a 1400 mAh battery, a 2 MP back camera that couldn’t take videos, and no selfie camera. This device was $600.

The Vision Pro is a first-generation product. Nothing quite like it has been made yet, so this is base zero.

Looking at the price makes it seem counterproductive to state that as something to be happy about, but look at it this way: Sure, it’s expensive, but this is the most primitive version of the flagship, industry-defining, mixed reality tech.

The prices of the individual costs of development and components will only go down over time, and although that may never make the flagship models of companies like Apple cheaper, it definitely makes these devices more affordable in the medium-to-long term for other companies to create. So, one day, we will look back with a different perspective, just as we do now when considering that the first iPhone, which was sold for $600, can not stand side to side when compared (software and hardware combined) to a bottom-tier knockoff smartphone that costs $80 today.

Do I think the Vision Pros will ever be so cheap that a college student can afford them without breaking the bank? Not really.

But if it, and subsequent iterations of its product line, can improve on the experience of mixed reality so much so that it becomes a realistic, pound-for-pound, all-in-one replacement for a 4K display, a $400 game console, a $900 smartphone and a $1200 personal computer, then perhaps people just might one day invest in a pair of goggles as opposed to a whole slew of consumer products that are always going to be somewhat expensive either way.

So, is this the revolutionary iPhone moment? Or is this just an expensive, albeit premature attempt at revolution?

We’ll just have to wait and see on that one.

One thing’s for sure: there is something deeply practical in this technology for the average person.

And that can dramatically change the world.