Inside Sumerian—Amazon’s Big Bet on Augmented and Virtual Reality

PCMag

Published in

PC Magazine

18 min readApr 20, 2018

PCMag got an exclusive look at Amazon’s new 3D development platform for building AR/VR apps.

By Rob Marvin

Amazon is making a grand entrance into the augmented and virtual reality space with Sumerian, an all-in-one development platform that can build AR and VR apps for smartphones and headsets, and — soon enough — AR/VR apps that’ll run right in your browser.

Within these experiences, Sumerian can create immersive virtual worlds populated by “hosts” — 3D characters brought to life by the same artificial intelligence tech that powers Alexa.

Sumerian is platform-agnostic. Rather than developing its own branded device or headset, Amazon opted for integration with existing offerings. Sumerian is built on open web standards and supports both Apple’s ARKit and Google’s ARCore, meaning app creators can build one Sumerian app that runs on Android, iOS, Oculus Rift, HTC Vive, and beyond.

As a new addition to Amazon Web Services, Sumerian is priced using that service’s usage-based model instead of a subscription and connects to other AWS services.

Amazon released a Sumerian preview in November when it was first announced, but ahead of its expected May launch, PCMag got an exclusive look inside Sumerian and a few early customer apps.

Kyle Roche, General Manager of Amazon Sumerian, took me through a demo of the 3D app-creation platform. I got a tour of Sumerian’s drag-and-drop app editor and 3D object library, its Visual State Machine for scripting complex automated scenes, and went inside the process of creating artificially intelligent hosts, which you can have full conversations with inside these virtual experiences.

I also spoke to Marco Argenti, a VP who oversees not only AR/VR, but also the AWS Mobile, Serverless Computing, and IoT divisions. Amazon has ambitious plans for Sumerian, and an even grander vision for the role augmented and virtual reality combined with AI will play in our connected future.

Why Amazon Is Betting on AR/VR

Wading into an entirely new industry or field has never stopped Amazon before. Just look at Whole Foods, Amazon Video, and its efforts in the healthcare and pharmaceutical industries.

Roche joked that the name Sumerian came from Neal Stephenson’s book Snow Crash (which Amazon is adapting as a series), and the idea for “hosts” came from HBO’s Westworld. But according to Argenti, Amazon’s decision to enter the the AR/VR space actually came down to three key factors: the emergence of smartphone-based augmented reality; untapped VR opportunities in the business-to-business (B2B) market; and helping AWS customers solve pain points with things they were already trying to do.

“These signals were strong enough for us to actually start getting into the process of designing Sumerian. In the classic Amazon way, we started working backward from customer use cases and then eventually funding a development team to build the product,” Argenti explained.

The B2B applications include scenarios like interactive digital signage (think the giant talking hologram ads from Blade Runner: 2049), virtual training, and a host of industrial Internet of Things use cases, such as using sensors to create digital twins and complex simulations. Argenti also underscored the importance of smartphone-based AR reaching an inflection point through ARKit and ARCore.

“The camera is becoming a very powerful tool to interact with reality and explore the world around you,” he said. “Fast graphics processors can overlay information in real time, and sensors can help construct a 3D reality. The idea is that you have a high-quality, high-definition, context-aware sensor in the hands of billons of people.”

How Amazon Built Sumerian

Amazon started thinking about what AR and VR would look like for AWS customers in late 2016, and a preview debuted at AWS re:Invent about a year later. In between, a few things happened.

First, Amazon bought a bankrupt Swedish startup called Goo Technologies; its 3D creation environment, Goo Create, became the foundation of Sumerian’s integrated development environment (IDE).

Goo Create’s visual 3D modeling was also a web-based cloud service, but Amazon took it a step further by moving the back-end to AWS. There are plenty of benefits to building on top of the scalable cloud infrastructure you already own, but a big one is dramatically reducing latency. Roche said one of Sumerian’s biggest selling points is that despite how powerful the editor is, there’s nothing to download or install. During the demo, Sumerian loaded from a browser URL in seconds. Even doing real-time natural language processing (NLP) and rendering elaborate animations didn’t slow it down much.

As with low-code development platforms, Sumerian can be used on a basic level with almost no developer experience. However, coders and data scientists can go a lot deeper with programmable APIs and Sumerian’s command-line interface to customize scenes and write complex app logic.

“We want an experience where you click and you’re immediately in the scene,” Argenti said. “Then you have 3D graphics tools where you can drag and drop objects. Sumerian is visual tool that can associate what happens when actions or events take place, potentially without writing a line of code.”

Creating 3D Sumerian Apps

The broader design philosophy Amazon followed with Sumerian is to consolidate the creation experience as much as possible. Roche said the idea was to mask a lot of the repetitive development tasks, so the basic process for building a Sumerian app is the same regardless of the AR and VR platforms on which you ultimately publish it.

It starts with either choosing a template or jumping straight into creating a new scene. Some of Sumerian’s default templates include scenes like office spaces, training rooms and warehouses, a cargo ship, and an outdoor campfire. The main editor supports WebGL and WebVR, and is laid out in the same way as many of the low-code tools we’ve tested.

On the left is an entities panel. An entity is essentially a table in a database that helps you manage the data getting pulled into your app. Below that is the asset window, which is where you can search for the objects you want to pull into a scene or open the full asset library of all Sumerian’s 3D models. Roche said Sumerian pulls in a number of open-source object libraries and integrates with the Sketchfab API. Amazon is also interested in integrating with platforms like TurboSquid and Google’s Poly AR/VR object library, he said. You can import your own assets into Sumerian as well and drop them into a scene.

“The asset panel can serve as a drop zone for an adjustment pipeline,” Roche explained. “You can drag most common 3D file formats; we’ll convert them, optimize them, and store them for you. One of the things we do on the back-end is if you’re using the same asset in multiple scenes, we’ll actually create a reference link for you.”

In the middle of the screen is the main canvas, where you can drag and drop assets and 3D models into a scene. In the corner of the canvas is a button to launch a WebVR preview of your scene.

Below that is the timeline editor, which works similarly to video-editing tools. As you pull animations and sounds into frames and use the Visual State Machine to create actions, host behaviors, and event progressions, it will all show up in the timeline, where you can adjust how one state transitions into another.

The right-hand column is the inspector panel, which shows the details on any components you’re looking at and how you can customize them. For a model that has maybe a hundred different variations, you can adjust things like attributes and textures without actually touching the scene.

Amazon’s Strategy: Integrate With Everyone

Sumerian plays not only in the 3D development space with platforms like Unity, Unreal Engine, and Vuforia, but also in the broader AR/VR ecosystem along with ARKit, ARCore, and Windows Mixed Reality. Roche said Sumerian applies the “build once, run anywhere” philosophy to AR/VR apps and especially for enterprise developers.

“Pro 3D developers or pro animators have a studio that’s working with them. But most [AWS customers] are web or mobile developers learning something like Unity on the job,” said Roche. “Unity is great, but to be really good at it is significantly more difficult than it is to take the skills they have — like if they’re good at JavaScript — and ease them into 3D that way. So we decided to focus on that part of the market.”

Sumerian supports several core open standards: WebGL, WebAR, WebVR, and the coming WebXR framework that will bring AR/VR apps to all devices and browsers across platforms. The World Wide Web Consortium (W3C) will vote to ratify WebXR in the coming months. At that point, Sumerian apps will be able to run directly in browsers.

Between WebGL, WebVR, and WebXR, Sumerian is completely platform-agnostic, and Sumerian published native wrappers to integrate directly with ARKit and ARCore for smartphone-based AR apps. Roche said Sumerian can build apps for any platform that supports WebVR, meaning not only Oculus Rift and HTC Vive but also Samsung Gear VR, Google Daydream View, and others. Sumerian is also working closely with the Google Chrome team on WebXR for browser-based apps.

The other major player in the room is Microsoft. While Amazon didn’t go as far as to say Sumerian would integrate with the Windows Mixed Reality ecosystem, Roche did say that the latest RS4 release of Microsoft HoloLens includes WebAR support, which means Sumerian can run HoloLens scenes. Amazon is also watching other headsets from companies like Magic Leap and Meta, but its approach gives Sumerian the benefit of flexibility.

“We made a choice. We could have gone down a path of making our own proprietary thing and pushing developers toward that,” said Argenti. “What we decided instead is to be as broad as possible in supporting what we think will be a massive market. Once everything moves to WebXR, the whole device ecosystem comes with it. We’re going after the underlying foundation.”

Sumerian’s AI Hosts Are a Game-Changer

Hosts are one of Sumerian’s most unique selling points. A host is a 3D-animated character you can place into an AR or VR scene. Users can ask hosts questions, and developers can script a complex set of actions, behaviors, gestures, and movements a host can perform as they have conversations and walk around scenes. Amazon drew inspirations for hosts from all sorts of places, including online games like Second Life and The Sims, Roche said.

Sumerian currently has two default hosts — Cristine and Preston — but will launch a whole series of hosts over the course of this year. Amazon built a lot of nuance into these AI characters. Roche showed me a demo of Cristine where he dragged the host into the scene, and pulled open the inspector panel to customize her emotions, facial expressions, and gestures. Amazon will auto-generate gestures as the host talks based on natural language processing of the conversation. So if Cristine says “Hi,” it might trigger a waving gesture.

With something called a point of interest system, you can check a box in the editor so the host’s eyes always pay attention to the camera. So if you’re wearing an HTC Vive Pro walking around a 360-degree space, the host can follow you. If it’s an AR app connected to your smartphone camera, Roche explained that Amazon’s Rekognition deep-learning system can run facial analysis of both where you are and where your face is in the frame to make it look like the host is looking back through your screen directly at you. It gives you the illusion of eye contact.

Customers can also create their own custom hosts from scratch using Amazon’s Maya SDK, but Amazon provides the basic skeleton from which you can adjust a host’s appearance, dialect and inflections, language, and more. In the long-term, Amazon is thinking about ways to make it easier to create hosts. Argenti talked about the idea of a host generator for first-person avatars, or using facial recognition to match rendered characters to real people.

“In conjunction with Rekognition, if we procedurally generate as many of these characters as possible, we can try to match you to the closest avatar. We’ll take your photo and run reverse facial recognition and match it to a randomized character to give you a host that looks like a version of you.”

Argenti explained how integrating other AWS services like the Amazon Comprehend natural language processing service could make hosts even more lifelike. Comprehend analyzes text to extract metadata on things like mood and sentiment analysis. So a host could have a different facial expression or manner or speaking based on the mood of the person they’re interacting with.

“If they’re angry, maybe the host calms them down,” Argenti said. “There’s an evolution not only in the way we convey information, but how we present it though deep sentiment analysis.”

Pulling in the Voice Services Behind Alexa

Hosts aren’t much good if they can’t speak. You can’t say “Hey Alexa” in a Sumerian app the same way you can activate Cortana within Windows Mixed Reality. Instead, Amazon uses the automatic speech recognition and natural language understanding APIs behind Alexa to let hosts have conversations.

Sumerian is integrated with Amazon Lex and Amazon Polly. Polly is a text-to-speech service that turns text into scripts a host can speak. Lex is an NLP engine for building conversational interfaces, which is how hosts can understand and respond to what users are saying within an AR or VR app. Sumerian currently supports more than two dozen languages through Polly, and there’s a lip sync feature that will match the host’s mouth movements to the cadence of the language or speech.

“Voice is a medium that really makes sense while you’re immersed in AR or VR,” Argenti said. “I want to talk to a character if I can see it standing there. So we picked up two of these tools from the AI group and tried to really personify them. We want the scene to be able to listen and respond to us. So you can take an entire Lex flow like you would for a chatbot and just drag it onto the character. In a lot of ways it’s actually easier than building an Alexa skill.”

Scripting Logic in Immersive Worlds

Sumerian’s Visual State Machine is where you can lay out complex sequences and virtual simulations. Using either the visual timeline editor or the full JavaScript interface, app creators and developers can script logic controlling how hosts or other objects in your scenes respond to different actions. For example, Sumerian includes a flying drone object that you could script to fly around.

This all gets more complicated when you introduce real-world objects into the equation. Since Argenti also oversees the serverless computing and IoT divisions at AWS, he talked about how connecting Sumerian to AWS services like Lambda and Greengrass can open up more possibilities for complex simulations. Greengrass is a way for machine-learning models to run locally on IoT devices themselves. Think about a ML model training on the data it’s getting from a machine on a factory floor, and then pulling that algorithm into Sumerian to simulate that same machine using AI.

“There could really be a simulated world in AR and VR where each character or object is intelligent from machine-learning training in the real world,” Argenti said. “Ultimately, you want to try to re-create reality in the most realistic way possible. Today we can get close, but it’s not quite there yet from a behavioral standpoint to simulating how things actually work.”

WeatherBug’s Simulated Meteorologist

When Amazon took me through a few customer demos of Sumerian, I was initially surprised when the first one was a weather app.

But as WeatherBug General Manager Olivier Vincent explained, virtual reality makes a lot more sense for weather data than you’d think. As people have started checking their weather in apps as opposed to watching forecasts on TV, Vincent said weather reports have lost one of their best touches: your local weatherman in front of a green screen.

“Weather is about telling you what’s going on at a given time in a given place. You can do it in a nice 2D way in your app for a quick look, but we knew how popular weathermen and women have been over the years,” said Vincent. “So the idea is to reintroduce that weather person within a more immersive experience in the app.”

WeatherBug built a Sumerian scene with a virtual news studio with an anchor desk and green screens, and plopped Amazon’s default Cristine host in as the meteorologist. The app pulls current weather data for your location that the host will then read back to you as part of a personalized weather forecast. From the main WeatherBug app, Vincent launched the VR viewer that zoomed through a 3D model of Manhattan as Cristine gave the forecast, complete with high and low temperatures, and falling snowflakes.

Addison, the Virtual Caregiver

New Mexico-based health management tech company Electronic Caregiver had a much different Sumerian experience.

The company offers tech for the elderly like a wearable with a medical help button, but it also built a solution called Addision Care, which cuts home care costs and uses conversational AI to assess elderly patients’ risks of falls. The company is releasing a kiosk to pharmacies, hospitals, and clinicians that analyzes a patient’s gait using machine learning. The software also uses Addison, a custom Sumerian host, to walk users through a verbal questionnaire about their fall history.

“Getting seniors to adopt technology is not so easy,” said Electronic Caregiver CTO Bryan Chasko. “As it gets better, voice technology is going to engage that market. You’re never going to get them to sit down in front of a keyboard and a mouse, but with Addison they can just have a conversation.”

Electronic Caregiver has been working on Addison for years, developing the 3D character using Amazon Lex and Polly. The company is one of the AWS customers that helped Amazon conceptualize the pain points it could solve with Sumerian and how to automate the AR/VR app-creation pipeline.

Judah Tveito, a Virtual Developer at Electronic Caregiver, said Sumerian took processes they had been working on for months and turned it into a few clicks. The company is also working on an Addison mobile app.

Electronic Caregiver ultimately envisions Addison as an in-home virtual caregiver, Chasko said. For elderly users living on their own, the AI could do things like remind them to take medication or automatically call 911 if there’s a fall or other medical emergency.

“One of the biggest issues we feel this can tackle is the isolation that seniors feel when they’re living alone,” said Chasko. “We really want to have a permanent 24/7 home environment where when you’re not talking to Addison [and] you’re in the bathroom and you fall, just being able to yell out, ‘I need help’ could save a lot of lives.”

The Moneymakers: B2B and IoT

Amazon talked a lot about how Sumerian apps could live in interactive digital signage. Imagine someone walking through a hotel lobby, a mall, or a stadium and seeing a host walking alongside them on a screen. It may sound a tad creepy, but hosts could also process contextual information like location to turn the ad into a personalized conversation. Argenti sees hosts as virtual extensions of a company’s brand that could change according to what a business needs.

“If you put these apps out somewhere with a camera, you can imagine a 3D character engaging with you personally: knowing who you are, maybe the last time you were in the venue, or even the last thing you ordered,” Argenti said.

“A cruise ship is a good example. The host might say ‘Hey, based on where you are on the ship right now, [with] your reservation in 20 minutes, you’re not going to make it. Do you want me to push it back 15 minutes? There’s also a visual aspect to the experiences you’re re-creating. A host talking to you about traveling is going to look different from one talking to you about your financial strategies. Adding location information could have characters change depending on where they find themselves; the way they dress, and the way they move, the inflections they have.”

Amazon is exploring a range of B2B and IoT applications for Sumerian customers. On the industrial front, Argenti said Nokia put together a system with sensors and visualizations to monitor the environment inside shipping containers to measure things like interior temperature and shock absorption without actually opening them up.

“You can imagine a world of AR where we’re connecting millions of devices to AWS and getting a lot of sensory data coming from the real world,” he said. “Then we can use AR with triangulation to identify an object and display relevant information on top of it. There’s a huge applicability there on anything that ranges from service and repair to monitoring, safety, and so forth.”

There are also broad e-commerce possibilities when you’re working with virtual assets. Any 3D model in Sumerian that comes from Amazon’s own shopping catalog has the potential to earn a developer referral fees if you add it to an AR/VR scene. It’s a way for both enterprises and the smaller businesses and independent developers on AWS to monetize their Sumerian apps.

Argenti envisions a lot of crossover opportunities between Sumerian and Amazon’s retail division.

“We can bridge some of the work that our retail team is doing home furnishings, home electronics, and other high-value items to create 3D models,” he said. “Then you can use them for a photo-realistic pass-through of a space. How do I set up a modern loft? What kind of furniture do we organize?”

It All Comes Back to AWS

Building bridges to all the AR/VR devices and platforms out there is a smart way to lower the barrier to entry to AR/VR development, not just for B2B companies but developers at large. At the same time, the most compelling business incentive behind Sumerian is as a tool to drive broader AWS usage. That’s true both for existing customers trying out the new service, and for new Sumerian users who then begin to use Amazon’s storage, compute, processing, AI, and other services because they’re all integrated into the experience. The pricing model is attractive too, because Amazon imposes no upfront fees. AWS only charges for what you use.

It’s all part of what Amazon sees as a larger loop where everything flows through AWS. Data comes from an IoT device, gets processed by an AWS Lambda function, deployed on AWS Greengrass to get to AWS IoT where it trains a machine learning module, and eventually gets pulled as a 3D model into an AR visualization in Sumerian.

Amazon’s Vision For Our AR/VR Future

Amazon hopes Sumerian can play a part in spurring the industry to make 3D mass market products and drive down the cost. On the AR side, Argenti said the basic enablers are in place thanks to ARKit and ARCore. He said the tipping point will come when there’s enough apps and video content from developers. On the VR side, the big changes Amazon hopes to see are are hardware coming down in price, becoming less clunky to wear, and going wireless.

“When that happens and you can wear a VR experience like a pair of glasses, VR will really take off,” Argenti said. “I think it needs to be as natural as watching a video on a tablet or a turning on a TV before it’s ready for mainstream consumption at the same level as the other screens we have today. Developing a whole ecosystem around it of content creators, advertisers, end-users, and companies catering to those users is how you do it.”

Argenti also underscored the important of immersiveness in virtual and mixed reality experiences. Another one of Amazon’s target use cases for Sumerian apps is education and training. Whether you’re learning how to use a medical device, service a vehicle, or learn a new language, he said it’s about dropping you in an environment that feels as viscerally as possible like the real world.

“You could sit down in a French bistro and learn the language without actually being there,” he said. “Host avatars are speaking French to you. The menus are in French. And then within that reality, you could automatically maybe touch a menu and see it translate, passing your finger on top of an item to see the words change into a different language. So much of education is contextual, and so as a learning tool, having an experience engage all your senses is powerful.”

Read more: “VR Gives Journalism a New Dimension”

Originally published at www.pcmag.com.