Brains Over Beauty, or Building AI Avatars Inside Out

Nahua Kang
twentybn
Published in
7 min readFeb 24, 2020
Illustrated by Anny (Luniapilot)

There have been a lot of exciting events and developments in the realm of AI and virtual beings since fall 2019:

Credit: Neon (top), @NicoleLazzaro (bottom left), TNW (bottom right)

Now that it’s 2020, I think it is important to discuss the two major approaches to creating intelligent virtual beings: outside-in and inside-out.

Virtual Beings from the Outside In

Companies that take the outside-in approach start building virtual humans by focusing on their appearance before moving on to enhancing their intelligence. Their focus gravitates towards a high level of realism of the static/animated CGI characters.

At present, outside-in is the most popular way companies are creating virtual beings. Lil Miquela, Shudu, Magic Leap’s Mica, and Soul Machines’ Virtual Humans are some of the best known examples. From rendering a subtle contraction of facial muscle in a smile to designing the clear, bright eyes that make virtual humans lifelike, the makers of these virtual characters show high attention to detail on the avatar’s realistic look.

Because of their realistic appearance, these virtual humans have captured the attention of the world. They are blurring the line between the physical and the digital, and in 2020, will continue setting the trend for the emergence of a new culture that includes and embraces virtual characters.

Credit: Lil Miquela (top left), Shudu (top right), Mica (bottom)

These virtual beings are even challenging the status quo of virtual assistants like Alexa. Magic Leap’s Mica stated in her manifesto: “Please don’t ask me to switch on your lights, turn up your music or give you directions…I am an educator, agitator, companion, artist and guide.” Pranav Mistry, CEO of Samsung STAR Labs, echoes that sentiment about Samsung’s Neon virtual humans: “Neons are more like us, an independent but virtual living being, who can show emotions and learn from experiences.” But if the Micas and Neons of the world want to become our digital companions and surpass Alexa, then there’s got to be more to it than simply appearances.

From industrial machines to robotics, from stationary speakers to virtual beings, the outside-in approach to designing virtual humans and avatars has made a profound impact on our social norms. Yet increasingly, the outside-in approach alone is unable to realize our dream of building intelligent, interactive, and humanlike virtual beings. This is where the inside-out approach comes into the picture.

Virtual Beings from the Inside Out

Most of us will want virtual beings to be more humanlike and less machinelike, more companionship and less servitude. However, focusing exclusively on the outside-in approach, i.e. the appearance, will not lead us to any “intelligent” beings.

Credit: Nikkei Asian Review

Let me ask you a question: Would you rather be friends with Star Wars’ BB-8 or a Pepper robot? I’d choose BB-8. Maybe you would, too. Despite not being as human-looking as a Pepper, BB-8 (if it actually existed) would be several orders of magnitude more intelligent.

Here’s the rub. We aren’t necessarily connected to things that look like us, but we are drawn toward beings that resemble and resonate with us emotionally and intellectually. This being could take the form of a cartoon character like Pixar’s WALL-E, come as the rude yet cute alien child in Her (2013), or walk as a physical humanoid like the naggy C-3PO. No matter the appearance and form factor, we would like them, sympathize with them, and befriend them, as long as they are intelligent and sympathetic in real time.

Credit: LucasFilm

If appearance is not the fundamental reason we connect with virtual beings, then the only way we could create a new form of “species”, either digitally or physically, is to breath intelligence into them. Therefore, the inside-out approach is essential to virtual beings. Once their intelligence emerges, we would not mind what different shapes virtual beings come in. We can customize our virtual beings to our individual preferences.

To give you an example, we are building intelligent avatars with the inside-out approach at TwentyBN. We started off training neural networks to see and understand humans and their behaviors. Now, we are combining visual inputs and audio dialogues to teach AI to simultaneously understand both what it sees and hears.

Roland, our Co-Founder and CEO, believes that “audio-visual dialogue is not only an approach to more intelligent and sociable AI, but holds the very key to thinking and common-sense reasoning in the world.” The inside-out approach is so far the only key to intelligent virtual beings that humans have dreamed of.

Artificial General Intelligence: Moving From System 1 to System 2

At NeurIPS 2019, Turing Award winner and TwentyBN’s founding advisor, Yoshua Bengio, shared an important outline for a roadmap to achieving artificial general intelligence that is inspired by the Nobel laureate Daniel Kahneman’s thesis on System 1 and System 2 in the human brain.

In essence, Bengio argues that current black box-like deep learning systems are similar to the intuitive, fast, unconscious, non-linguistic and habitual part of the brain (System 1), while the future of deep learning should be slower, more logical, more conscious, involving more planning and reasoning. In layman’s term, a System 1 deep learning model can tell us right now if an image has a hotdog in it or not but it can’t reason why it’s making such a judgement (the system also won’t understand the italic it’s in this sentence due to a lack of common sense).

Credit: Yoshua Bengio, NeurIPS 2019

A System 2 deep learning model, however, will be able to explain that it sees a hotdog in the image because it sees a filling between two pieces of what is known as the hotdog bun, and the filling is a sausage. The AI recognizes that the visual input it receives complies with the definition of a sandwich, but understands that it’s a specific type of sandwich known as a hotdog.

Current System 1 deep learning is capable of powering applications across a wide range of industries but must be constrained to narrow tasks. It is extremely good at finding patterns that human brains are incapable of perceiving, but it cannot perform fundamental cognitive functions that humans can, such as reasoning about time and causality, focusing with attention, acquiring skills from a small dataset and learning how to learn.

Therefore, developing a System 2 that complements the existing System 1 is essential for AI to obtain humanlike intelligence. Virtual beings that interact with you through audio-visual dialogue provide a hotbed for common-sense temporal reasoning and human-like intelligence, including Bengio’s roadmap towards general AI. Stay tuned, we might write a piece that digs deeper into Kahneman and Bengio’s ideas soon.

Focus on brains, not looks

I hope this article has inspired you to think deeper about how virtual beings are created. Both the inside-out approach and outside-in approach are meaningful to our industry as a whole. Outside-In is the glamorous one that generates buzz and paves the way for society to accept new art-forms and a culture that accepts virtual beings. This is what we are already great at. Inside-Out is the laborious one that blazes the trail to reach humanlike intelligence for AI. This is where we still have a long journey ahead.

The fantastic and magical vision for intelligent, personal avatars is realized where the inside-out meets the outside-in. So let’s focus on the brains now 🧠

Nahua

Edited by David and Moritz. Illustrated by Anny.

Do you enjoy reading this blog? Sign up to our newsletter below 👇 to receive monthly digests on the latest trends in virtual beings and AI embodied:

--

--