Image for post
Image for post

Semantic Understanding (Part 2/2)

Building machines that can process real-world information like a human

If you’ve not read Part 1 of this post, be warned — significant spoilers ahead! :)

In his book ‘Sapiens,’ Yuval Noah Harari posits that humans became the dominant species we are today because of our ability to tell stories. Tribes, nations, cults, communities, companies — they were all built on our ability to weave compelling myths, visions, narratives, and stories. Deeply rooted in our ability to tell stories is the complex interwoven architecture of ‘Language.’ Language is how we communicate with each other — the understanding of common constructs in the real-world is derived through etymologies hidden in language and its origin.

Language is so complex that training a machine to understand it is one of the hardest computer science problems for scientists to crack today. There are accents, dialects, meanings, context, lexical features (stemmed words, lemmatized words, etc), shape features (word cases), parts of speech — the list goes on. Frankly, it’s not something to worry about unless you’re planning on getting a Ph.D. in Computational Linguistics.

When we began developing our vision of building Semantic Understanding within our organization two years ago, we were extremely lucky to have started at a time when incredible Natural Language Understanding (NLU) frameworks began emerging that we could leverage right out of the box. Product teams around the world have the opportunity to leverage these APIs today to augment their products with very little dev effort and investment. While there are many good frameworks available in the industry, including free open source ones, my personal favorite is the one Google provides as part of their GCP suite. Google’s scientists and researchers have really excelled in this area given their scale as a company. I’ll let the man, the myth, the legend Jeff Dean give you a glimpse of their latest breakthrough in NLU.

What platforms and frameworks like the one Google offers to bring are decades worth of computational linguistics research in their most powerful but generic form. Out of the box, the machine has enough vocabulary and grammar to parse and construct sentences to engage a human in a conversation. We still have to teach the machine to perform ‘Skills,’ or things we want it to do, while also ‘training’ it to become effective at robustly answering these questions.

Building machines that can process information like a human would, or Semantic Understanding, is comprised of two main operations.

Image for post
Image for post
Natural Language Understanding

Operation A involves teaching the machine to understand what a human is saying. This includes processing an unstructured (and sometimes incoherent sentence), followed by enabling the ability to extract critical information from what the human is saying. This is where the NLU framework comes in handy. By pointing it to things it should pay attention to, we’re simply manipulating an already intelligent machine to recognize the things that are important.

Operation B involves taking that extracted information, finding a way to break down the task into achievable chunks, and looking into the knowledge base (graph) the machine has access to in order to serve up accurate results. For a machine to provide valuable information to the human, it must first and foremost have access to rich knowledge. This includes everything it has access to, has learned through ongoing interactions with humans, and what it believes to be the most relevant information in any given context.

Let’s put this whole process through an example.

Operation A — Interacting with a Human

Image for post
Image for post
Conversation Design

What is the human saying and is it similar to something the machine has done in the past?

Humans have radically different ways of asking for the same thing. E.g. What’s the weather like? Give me the weather. How cold is it outside? Do I need an umbrella? Similarly, the example of finding an expert could be uttered in many different permutations and combinations. These variations are called ‘Utterances’ — the different ways humans can request for information. We need to demonstrate to the machine the different ways humans can request for things by providing it a few examples.

What is the main goal of the request from the human?

The machine needs to have the ability to decipher from these utterances what the intention of the user is. We can do this by creating a specific goal in the system and giving the machine examples of a few utterances the user may use to achieve that goal. In the example above, the goal is ‘Finding People based on key attributes’.

How many smaller tasks can the machine break down the overall request into?

For a machine to fulfill a goal, it may have to break down the primary intent into several smaller sub-intents. In the example above, there are several sub-intents making up the primary intent. E.g. Finding an AI Expert. Making sure that list includes experts in Consumer Banking. Recognizing who Aparna Sethi is. Understanding the context of Mastercard as a client. Identifying people who may have worked with her before. Finally, making sure everyone is available next Wednesday. This allows the machine to process a task in the exact way a human would go about accomplishing it.

What information is the human providing the machine so it can fulfill the task?

Every utterance and intent is sprinkled with information by the user that can be used to grasp the context in order to fulfill the task. In the example above, the critical parameters that can be extracted are ‘Skill Type: AI,’ ‘Skill Level: Expert,’ ‘Skill Experience: Consumer Banking,’ ‘Client Contact: Aparna Sethi,’ ‘Client Brand: Mastercard,’ ‘Availability: Next Wednesday.’

How does the machine respond when it has what it needs to fulfill the request of the human?

We also have to train the machine by giving it examples of how to respond. Conversation is a two-way street. Out of the box, the machine can respond with simple sentences like, “Here’s what I found”, but we can train it further by providing a few examples that demonstrate more personality. E.g. “Here are the people available to meet with Aparna next week. Let me know if you’d like me to keep looking.” These responses can also be trained to become much smarter: “I’d recommend speaking to Emily. She’s worked with Aparna before, and is available next Wednesday.” Again, this depends on how well the machine is trained to handle the varying nuances by understanding different contexts and tailoring responses to those contexts.

The function of Operation A is to engage with the human, process what they’re saying, understand what is being asked, extract critical information to fulfill the request, and send that information to Operation B. The real Semantic Understanding to fulfill this request resides in the underlying architecture in place.

Operation B — Searching and responding with relevant information

Transforming an organization through a data-centric lens requires the entire organization’s infrastructure to evolve to support Semantic Understanding. This requires rethinking the technology infrastructure and being mindful about what systems we put in place to enable agents to proactively learn, optimize, and report within the infrastructure. The architecture diagram below provides a comprehensive overview of what that Semantic Architecture looks like. The main goal of this architecture is to enable ‘Knowledge as a Service (KaaS)’ — making the data and knowledge of the organization visible for the people.

Image for post
Image for post
Semantic Architecture — Knowledge as a Service

While there’s a lot of intricate detail behind the role of each component within this architecture, let's simplify this and focus on how the Experience, Intelligence, Data, and Analytics Layers enable Semantic Understanding.

The Experience Layer is responsible for managing human interaction. Enterprises today have several channels of engagement for their employees. These channels can range from people using smartphone and desktop apps to using large screen touch displays or even third-party IoT devices installed in meeting rooms or the office. The Digital Assistant can manifest across any number of channels and it’s up to the Experience Layer to ensure integrity and seamlessness across all channels. The Experience Layer leverages robust Systems Design libraries like Material Design by Google or Fluent Design by Microsoft to ensure a consistent experience is delivered across all channels. The example below illustrates how the same question can be answered across different channels while delivering a single consistent experience.

Image for post
Image for post
Design System

The Intelligence Layer is responsible for extracting critical information from the Experience Layer, breaking down the human request, searching through Data, and providing an adequate response. There are three key functions of this layer. First, the Cognitive Processors are responsible for giving the machine cognitive abilities like parsing speech, engaging in conversations, and providing the ability to recognize objects. Second, the Knowledge Engines which are composed of Machine Learning and Deep Learning models and algorithms enable the machine to infer data and make decisions. These models are the research, learning, and development center of the machine. They’re constantly looking at what’s happening across the platform, learning from previous interactions, and inferring patterns from new data. Third, the Knowledge Graph is where the platform hosts relationships and context. This gives the machine a real semantic understanding by mapping data through relationships relative to the real world. These three functions enable the core Semantic Understanding we’ve been discussing all along.

Image for post
Image for post
Knowledge Graph that stores context about the real-world

The Data Integrations Layer is responsible for aggregating data from various sources available. Data is what feeds the brain of the machine. Robust access to clean data in a singular location, ensuring integrity of well-structured allows the Intelligence Layer to map these relationships. Imagine the number of systems that run an enterprise — HRMS, Outlook, Teams, Salesforce, Oracle. All these systems still very much in today's world are siloed systems. For any organization willing to enable Semantic Understanding within their company should be looking at leveraging all this data through a singular lens.

The Analytics Layer, which is a vertical across the three horizontals of our Semantic Architecture is constantly enabling the platform to become better over time. The Experience Layer requires insight on how humans are using the platform allowing it to constantly course correct itself through better experience management. Analyzing the Intelligence Layer allows the extraction of business critical insights that can be delivered directly to executives through executive dashboards. Finally, the Data Integrations layer can be monitored for ongoing data gaps the organization has based on what the people across the company are requesting.

The combination of Operation A — dynamically training a machine with enough cognitive skills to engage and interact with a human, followed by Operation B — constructing an enterprise-wide Knowledge as a Service (KaaS) layer are the key components of enabling Semantic Understanding within an enterprise.

This end-to-end architecture creates a critical foundation for enabling Semantic Understanding within an organization. The NLU and Cognitive Processing layers provide the key functions for a machine to interact, engage, perceive, and process what a human is saying. It can do this because it has been trained to perform skills by recognizing intents through a variation of utterances. Once these layers can decipher what a human is asking, it can leverage the underlying architecture to query, retrieve, and contextualize information for the human. The underlying architecture has been tailored to recognize, record, and report everything happening on the platform. The Enterprise Graph builds the intelligence behind which connections across the organization are important and what information is relevant.

When Sundar Pichai says, “We will move from a Mobile-first world to an AI-first world”, what he fundamentally means is organizations around the world will become more data-centric. They will understand their data better by finding unique, creative, and differentiated ways to leverage Machine Learning to increase people value. Semantic Understanding and the architecture we discussed above that enables Knowledge as a Service for enterprises is the radical shift we will see in how organizations around the world will evolve in the Automation Economy.

Image for post
Image for post
Join the club!

👏 ❤️ 🎉

Share the love!

Written by

Maker. Polymath. Creator.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store