One of the key components of an artificially intelligent agent is its knowledge base. Simply put, the knowledge base is the collection of facts that the agent knows about the world and draws from in formulating answers. At Ozlo, we model the world as a directed graph, with nodes representing concepts, and the edges between nodes as the relations between the two concepts. By finding the correct nodes and following the correct edges, Ozlo’s systems make inferences about the world.
For example, we can answer the question: “Is ebi nigiri kosher?” We first recognize that “kosher” is a restricted diet and that “ebi nigiri” is a dish. Dishes contain ingredients and the ingredients of ebi nigiri are rice and shrimp. Shrimp is a type of shellfish and shellfish are prohibited by a kosher diet. Therefore, ebi nigiri is not kosher.
The Ozlo knowledge base combines structured, semistructured, and unstructured data sources. For example, food knowledge is compiled from semistructured sources like menus, cookbooks and Wikipedia. Conversely, knowledge about individual businesses uses structured data from partners, along with the analysis of unstructured data from reviews and user queries to understand what attributes are noteworthy about particular businesses. However, if we simply built the knowledge base this way, we’d be left with a rather spotty graph. For example, the system may know that a concept called “turkey sandwich” exists, but not know that it contains turkey, or that it is a sandwich. In order to add this missing information, we make another pass over extracted concepts in order to make further deductions about them.
The knowledge base is one of the key components that enable Ozlo to actually understand natural language, not just search for keywords. This is done by matching terms in a user’s utterance to concepts in the knowledge base. We do this by first normalizing the surface form of the utterance by singularizing nouns (e.g. “pizzas” becomes “pizza”) and lemmatizing verbs (e.g. “eating” becomes “eat”), and then looking for words and concepts the knowledge base recognizes. Importantly, the knowledge base is aware that certain words can have multiple meanings depending on the context. For example, depending on the context, “tendon” can be contraction for “tempura donburi” or the connecting fiber between muscles and bones.
Every concept is associated with a category. Categories are special nodes in the knowledge base that help in task matching, information extraction, and question answering. Once the task is determined, the irrelevant nodes are ignored, and the appropriate semantic edges can be followed to to find the knowledge base nodes that answer question. In cases where an ambiguity can not be resolved, the system can use the knowledge base to provide useful alternatives if a user’s specific request can’t be satisfied.
It’s important to note that Ozlo’s knowledge base is not simply a static database. As Ozlo’s functionality expands, so does the knowledge base. In order to bring on a new data source, we first construct a separate knowledge base containing key concepts and relations within that data source. We then construct a set of golden inferences that use only the facts found within that knowledge base.
Once we are satisfied with quality of the new knowledge base, we then promote it into the main knowledge base. In order to do this, we try to align the new knowledge base with the pre-existing knowledge base. Alignment is done by looking for common or equivalent terms assigned to the nodes in each knowledge base. When we have a high confidence that that two nodes are equivalent, we can import the edges from new knowledge base into the pre-existing one. Conversely, novel nodes can naïvely be promoted. For nodes that we are unsure about, we crowdsource possible alignments using human judges if two nodes are equivalent or different concepts. Finally, we reinspect the new combined knowledge base checking for missing or incorrect inferences.
It is important to realize that no knowledge base is perfect. At Ozlo, we measure the quality of the knowledge base in various ways. We encode domain-knowledge rules to help mitigate errors regarding spurious edges or missing required information. We check the system’s inferences against a set of “golden queries”, queries that we expect to be able to answer correctly. We ask human judgements to verify some assertions. Lastly, we check the knowledge base for user satisfaction. By looking at actual user queries to holistically check where the system is failing to understand, and then prioritize what other domains or knowledge sources we need bring into the knowledge base to better meet expectations.