What is the technology behind Viv, the next generation of Siri?

Brian Roemmele
9 min readMay 14, 2016

The secret to Viv is the system actually writes it’s own code. In contrast to any other similar system, It is a profound and monumental giant leap forward.

Dynamically Evolving Cognitive Architecture

The structure of the Voice First world is held together by Intelligent Agents. Intelligent Agents use AI (Artificial Intelligence) and ML (Machine Learning) to decode volition and intent from an analyzed phrase or sentence. The AI in most current generation systems like Siri, Echo and Cortana focuses on speaker independent word recognition and to some extent the intent of predefined words or phrases that have a hard coded connection to a domain expertise.

Viv uses a patented [1] exponential self learning system as opposed to the linear programed systems currently used by systems like Siri, Echo and Cortana. What this means is that the technology in use by Viv is orders of magnitude more powerful because Viv’s operational software requires just a few lines of seed code to establish the domain [2], ontology [3] and taxonomy [4] to operate on a word or phrase.

In the old paradigm each task or skill in Siri, Echo and Cortana needed to be hard coded by the developer and siloed in to itself, with little connection to the entire custom lexicon of domains custom programmed. This means that these systems are limited to how fast and how large they can scale. Ultimately each silo can contact though related ontologies and taxonomies but it is highly inefficient. At some point the lexicon of words and phrases will become a very large task to maintain and update. Viv solves this rather large problem with simplicity for both the system and the developer.

Specimen of the developer console identifying domain intent programing.

Viv’s team calls this new paradigm the “Dynamically evolving cognitive architecture system”. There is limited public information on the system and I can not address any private information I may have access to. However the patent, “Dynamically evolving cognitive architecture system based on third-party developers” [5] published on December 24th, 2014 offers an incredible insight on the future.

Dynamically evolving cognitive architecture system based on third-party developers

US 20140380263 A1

ABSTRACT

A dynamically evolving cognitive architecture system based on third-party developers is described. A system forms an intent based on a user input, and creates a plan based on the intent. The plan includes a first action object that transforms a first concept object associated with the intent into a second concept object and also includes a second action object that transforms the second concept object into a third concept object associated with a goal of the intent. The first action object and the second action object are selected from multiple action objects. The system executes the plan, and outputs a value associated with the third concept object.

Some consumers and enterprises may desire functionality that is the result of combinations of services available on the World Wide Web or “in the cloud.” Some applications on mobile devices and/or web sites offer combinations of third-party services to end users so that an end user’s needs may be met by a combination of many services, thereby providing a unified experience that offers ease of use and highly variable functionality. Most of these software services are built with a specific purpose in mind. For example, an enterprise’s product manager studies a target audience, formulates a set of use cases, and then works with a software engineering group to code logic and implement a service for the specified use cases. The enterprise pushes the resulting code package to a server where it remains unchanged until the next software release, serving up the designed functionality to its end user population.

Viv Has Built An Easy Way For Developers To Build

This Viv patent is a landmark advance for Intelligent Agents and the resulting Voice First devices and uses case that will be developed on the platform. The process for adding new domain experience is a simple process in the developer app.

To define a new intent, the domain is established by programing a horizontal flow chart that helps to define ontology and taxonomy with in the entire system. The results are lines of code that will forever be dynamically changing and connecting as more domains of intent are established. Viv literally programs itself. This is process is related to self modifying code that has been around since the 1960s from assembly language to Cobol. However the process that Viv uses is radically more advanced.

Specimen of the developer console identifying domain intent programing.

The limitations we have all come to know with Siri, Echo and Cortana and the Chat Bots released with Facebook M are tied to the limitations of extending new intent domains and connecting new ontologies and taxonomies. Not only does each intent domain need to be programed, from decoding a word or phrase, but these silos of intents need to some how connect when more complex sentences are created. For example:

“(Siri-Alexa) I want to pick up a Pizza on the way to my girl friend’s house and I would like to find a perfect wine to pick up along the way. Also would like to bring her flowers.”

Currently Siri and Alexa could not understand the intent of this paragraph, nor could it easy connect to the six domains and many ontologies to produce a useful result. Viv could learn this in a few minutes and constantly connect with new intent domains by expanding the ontological references each domain represents.

Another feature of Viv will be the user profile that defines:

Conversational intent-Understands What You Say :

– Location context

– Time context

– Task context

– Dialog context

Understands You — Learns and acts on personal information:

– Who are your friends

– Where do you live

– What is your age

– What do you like

You will set privacy fences around any information that Viv learns and you will be able to choose to allow the system to share this data with any intent domain. Of course security and privacy will always be an issue with Intelligent Agents and Viv is working on a new model that will quickly define what is logically private and potentially shareable with permissions.

Specimen of the current domain cloud. Note the appearance of Payments and Money.

The “Dynamically evolving cognitive architecture system based on third-party developers” explains the complexity involved this way:

Specimen flow chart from “Dynamically evolving cognitive architecture system based on third-party developers” patent.

FIG. 1 illustrates a block diagram of an example plan 100 created by a dynamically evolving cognitive architecture system based on third-party developers, in which action objects are represented by rectangles and concept objects are represented by ovals, under an embodiment. User input 102 indicates that a user inputs “I want to buy a good bottle wine that goes well with chicken parmesan” to the system. The system forms the intent of the user as seeking a wine recommendation based on a concept object 104 for a menu item, chicken parmesan. Since no single service provider offers such a use case, the system creates a plan based on the user’s intent by selecting multiple action objects that may be executed sequentially to provide such a specific recommendation service. Action object 106 transforms the concept object 104 for a specific menu item, such as chicken parmesan, into a concept object 108 list of ingredients, such as chicken, cheese, and tomato sauce. Action object 110 transforms the list of ingredients concept object 108 into a concept object 112 for a food category, such as chicken-based pasta dishes. Action object 114 transforms the food category concept object 112 into a concept object 116 for a wine recommendation, such as a specific red wine, which the system outputs as a recommendation for pairing with chicken parmesan. Even though the system has not been intentionally designed to create wine recommendations based on the name of a menu item, the system is able to intelligently synthesize a way of creating such a recommendation based on the system’s concept objects and action objects. Although FIG. 1 illustrates an example of a system creating a single plan with a linear sequence that includes three action objects and four concept objects, the system creates multiple plans each of which may include any combination of linear sequences, splits, joins, and iterative sorting loops, and any number of action objects and concept objects. Descriptions below of FIGS. 4, 5, and 6 offer examples of multiple non-linear plans with splits, joins, and other numbers of action objects and concept objects.

Just in this simple sentence “I want to buy a good bottle wine that goes well with chicken parmesan” there are dozens of intent domains that would be connected. Viv’s can build a result dynamically even if this question has never been asked before. Viv operates on the intent domains from the extracted words in the sentence and in real time constructs an answer.

Viv Labs will be opening the system up to developers and I am predicting a land rush similar to when Apple opened up the App store for the iPhone.

Payments Are The Foundation And Replace Advertising

Central to every Voice First system is Voice Commerce and Voice Payments. The patent speaks to this in a unique way:

Content application program interface providers desire branding, to sell advertising, and/or to sell access to restricted content. Data providers and data curators want recognition, payment for all content, and/or payment for enhanced or premium content. Transaction providers desire branding and transactions via selling of some good or service. Advertisers desire traffic from qualified end users. A single person or organization may play more than one of these roles.

The shift from push mechanisms of the current adversing paradigm to the pull mechanisms of Voice Commerce will define the rise of Voice Payments. Currently even the most technically defined payments companies are not in the position to adapt to the new paradigms.

Viv Learns The Way Humans Learn

This type of system may sound very familiar to most of us, this is very close to the manner that humans learn. We assemble domains and form ontologies that connect intent.

On December 24th, 2014 when I first viewed the “Dynamically evolving cognitive architecture system based on third-party developers” patent from Six Five Labs, “goose bumps” ran through my entire body, for in that moment I saw what I have been studying since 1989 in my Voice manifesto. I spoke to this in some detail recently here on Quora [6]. In this Quora Knowledge Prize posting I detail how Voice First systems will completely change advertising, commerce and payments. I got int to more details in the industry publication Tech.pioions [7].

Viv is the first system to pull together the right elements of speech recognition, speech synthesis, AI, ML, self modifying programs, commerce and payments in such a way that I assert in 10 years 50% of computer interactions will be via Voice primarily on Voice First devices. The Viv we see today (May 9th, 2016) is one small step in this direction, but a giant leap for the future of computers.

--

--