Intelligent AI Avatars: Artificial Intelligence’s Adjacent Possible

Have you ever wondered why the backpropagation algorithm was invented in the 1980s but the AI boom of deep learning only took off over 25 years later? Or how come Charles Babbage designed the world’s first computer in 1837 but an actual computer was not built until a century later? How do we ascertain if an idea will take off soon or lay dormant for decades or centuries? And where will the next big idea in AI come from? To answer these questions, we turn to the adjacent possible.

Request a demo to test TwentyBN’s AI avatar Millie at!

The adjacent possible is a theory originally coined by theoretical biologist Stuart Kaufmann to describe the origin of life on Earth. The American popular science writer Steven Johnson later expanded the concept to broadly describe human innovation as a slow and iterative process that repurposes previously existing ideas and tools, a.k.a. spare parts. These spare parts can be either conceptual, like neural networks, or mechanical, such as large-scale GPU units. Innovation, according to Johnson, is “a story of one door leading to another door, exploring the palace one room at a time.”

This implies that only certain extraordinary innovations can come to fruition at specific moments in time. Discovering new ideas that lead to concrete innovations depends on successfully combining existing ideas. The futuristic, fantastic society we envision cannot be realized until what is between now and that future is bridged by the spare parts that exist today.

At TwentyBN, we believe that the spare parts for building autonomous humanoid robots are not yet ready, because there remain countless physical issues to solve when it comes to robotics. But the spare parts for a robot’s vision system are readily available. That’s why we are building a vision system for robots and bootstrapping its body by building a stationary, digital version of a robot, in the form of an AI avatar. Most robotics projects are merely bodies without brains. Our avatar is a brain without a physical body.

Millie, the world’s first intelligent avatar

The Realm of AI’s Adjacent Possible

On our journey towards humanlike AI, we’ve discovered that to build successful products, startups must clearly distinguish what innovations they are capable of advancing given the spare parts available in the present. Therefore, clarifying what is in and outside of AI’s adjacent possible helps us push the cutting edge to develop concrete, practical advancements and avoid over-inflated expectations, which historically have caused major setbacks.

Beyond the Adjacent Possible Lies the Killzone

There are piles of visionary AI projects that ended up in the innovation graveyard, awaiting resurrection. The Summer Vision Project led by Seymour Papert and the Blocks World project led by Marvin Minsky, both in the late 1960s, are two notable examples. The 1966 Summer Vision Project, strived to rapidly construct a significant part of a robot’s visual system that could perform object identification within a summer’s time. The Blocks World project, in 1968, ambitiously aimed to build AI that could recognize blocks of varying shapes and sizes and assemble them into structures without human guidance. It would take almost another 50 years before either project was solved.

Marvin Minsky and Seymour Papert, two AI pioneers who made great contributions to the field, proposed the Summer Vision Project and the Blocks World Project, both of which failed.

These two endeavors are the epitome of AI research that, at their respective moments of inception, existed outside of the adjacent possible. In hindsight, the materials to build computer vision systems with human-level performance on image classification include, among other things, mechanical spare parts like GPUs and image datasets (i.e. ImageNet), as well as conceptual parts like deep neural networks, none of which existed in the 1960s.

Developing autonomous humanoid robots today is like working on the Summer Vision or Blocks World project of the late ’60s. Building an artificial brain for robots is difficult enough without the added challenge of tackling the daunting task of the robot’s body. Additionally, the spare parts for creating fully autonomous, self-conscious robots are yet to emerge. The lesson for businesses: Companies that mistake the long-term vision for mid-term commercialization risk ending up as R&D labs.

The Adjacent Possible for AI

In a previous blog post, we proposed that the next frontier for true machine intelligence is visual common sense. Following the mass adoption of deep learning algorithms and the use of GPU to train AI models, researchers have achieved human-level performance for object identification. But the potential of deep learning does not stop at image classification. Extrapolating from the success of image classification and ImageNet, we can train computers using video data to perceive what continuously goes on in the real world in real time!

With a deep visual understanding of human actions, behaviors and the physical world, machines can now understand social cues, make eye contact with humans, and perform various simple, telling gestures such as a shoulder shrug. Besides, we are working on fusing conversational AI with visual understanding, thereby grounding dialogue and NLP visually. For TwentyBN, vision will not only be the window to a robot’s intelligence, but serve to make AI more humanlike and natural in its communication with us. Indeed, we are ready to develop such vision systems. Our launch of Millie in 2018 and SuperModel in 2017 illustrate that grounding language in visual common sense and understanding dynamic scenes for machines is challenging, yet, adjacently possible.

TwentyBN’s Spare Parts for the Adjacent Possible

TwentyBN carefully chose its spare parts when building Millie, our AI brain with a digital body. Mechanically, we have built the world’s largest deep learning data platform with millions of videos for machines to see and understand human actions. The platform, powered by Crowd-Acting, serves as a flywheel for TwentyBN to quickly develop new features and skillsets for our AI avatars.

Meanwhile, faster computational speed through parallel architecture and larger memory enable GPUs to be deployed en masse, which makes training AI models more efficient and running AI brains faster in deployment.

Additionally, character rendering is a mature technique and can be easily achieved via game engines in real time to enhance a human-like avatar personality that enriches human-machine interaction.

Conceptually, we leverage deep learning algorithms thanks to backpropagation. Specifically, we develop AI brains via a combination of supervised and transfer learning. Supervised learning, we believe, is currently the only approach that successfully teaches machines common sense knowledge of our world. On top of that, we leverage transfer learning so that our avatars can acquire similar skills across different domains quickly.

By exploring first-order combinations of these proven methods and tools, we are probing and expanding the space of the adjacent possible by creating AI avatars that can see, understand, and interact with humans.

Towards Intelligent AI Avatars

Just like YouTube could not exist before the Internet and Uber not before the mass adoption of smartphones, autonomous humanoid robots will not become commercially or technically viable until the right spare parts emerge. However, with millions of annotated videos, powerful GPU machines, and deep learning, we are ready for AI to fundamentally change the way machines interact with and serve humans. Therefore, it is our conviction that intelligent avatars will be the dominant form of AI that serves us in every corner of the world, either in life-sized kiosks or onscreen displays. This is the adjacent possible in AI today.

Interested in staying up-to-date with TwentyBN’s progress with Millie? Follow us on Linkedin and Twitter or visit our website.


We teach machines to perceive the world like humans.