Image for post
Image for post

Challenges of Knowledge Graphs

From Strings to Things — An Introduction

#tldr Search and even semantic search are simply not enough these days. Users request condensed information that is easy to ingest in order to make sense of an ever more complicated world. This require a new approach to uncovering and presenting information relying on aggregated facts and knowledge. This post begins by exploring the typical workflow and pain points faced by teams diving into the highly challenging task of extracting and organizing what is currently known about the world.

Deep Learning is stealing the spotlight in Silicon Valley, no illusion. But it’s actually a whole different beast feeding and enriching the daily results proposed by your favorite search engine. From the moment you click search, multiple attempts are made at understanding what your query is all about. After all, Google’s core business relies on answering people’s questions as accurately and rapidly as possible. So how do you teach your system, and ultimately your users, that Machine Learning and Data Science are related? Or that Noam Chomsy is an American Linguist? What about the height of Mount Makalu? And that word embedding is part of natural language processing? Easy, you figure out the answers and tell them about it. Or at least that’s the idea behind Knowledge Graphs (KG).

In 2012 Google officially made their announcement of their own version creatively titled Knowledge Graph, as their first step towards not only searching for pages that match your query terms but also for “entities” that the words describe. The goal is to give you the known facts about various things in one neat little card; taking you from zero to hero in a single read through. Computers don’t need to be symbolic communicators but humans apparently do.

Beyond the obvious searching and displaying information about entities, these interconnected units of knowledge actually powers and enhances multiple backend features:

  • Disambiguating and recognizing entities in context
  • Data expansion to enrich semantic search
  • Connecting entities to content and data sources
  • Recommendation engine for related information
  • Entity-centric user interfaces
  • Inferential reasoning

How does it work?

Simple enough isn’t it?

Image for post
Image for post
From strings to things, knowledge graphs aim to structure what is known about the world. From powering up search to quick summaries of known entities, it makes information that much easier to discover and enables world-aware inferences.

Creating a large knowledge base is actually quite a challenge in part due to the difficulties and subtleties of language and the ethereal transient nature of knowing something (e.g. facts and knowledge are continuously evolving). For all the language understanding thrown around in conferences and recent developments in deep learning (e.g. dynamic memory networks), there is still no universal algorithm for parsing and distilling a thorough, non ambiguous, understanding of text. We still have a long way to go and until then, current algorithms have definitive difficulties deciphering the meanings and intent behind words. Take for example:

Image for post
Image for post
Difficulties of language you say?

This is even more obvious when you consider the wealth of cultural and historical information each of your users bring to the search box. For example, you may consider “to be or not to be” an obvious quote from Shakespeare but someone not familiar with his work is likely to think it’s just a quirky amalgam of words. Your system needs to be aware of these scenarios! And if it could recommend similar works of literature that’d be great, thank you very much.

There are of course various techniques to help you on your journey but soon you realize the difficulties are not purely algorithmic and transcend multiple domains of engineering. This post will outline some of the major pain points along with potential avenues of solutions for you to consider.

Image for post
Image for post

At its most basic, the end game of any knowledge mining is a list of entities (i.e. a recognizable sequence of characters with a specific meaning) and triplets; often simply described as a subject, object, and predicate (or entity-attribute-value and many other variants…).

Image for post
Image for post
The basic interpretation of a triplet is a subject, object, and a predicate linking the two.

All we need to do is extract entities, uniquely resolve them, and link them together. Is it as easy as its sounds? … Does it sound easy?

For readability I’ll be splitting this blog into a multi-part series diving into various aspects of the challenges involved with building and maintaining knowledge graphs (i.e. from algorithms to storage) while sharing some code and tips to get you started.

Projects you should know about

Image for post
Image for post
Knowledge graph of Knowledge graphs
  • Never-Ending Language Learning (NELL): Research project from Carnegie Mellon University attempting to create a computer system that learns over time to read the web (over 50 million candidate beliefs).
  • Freebase / Probase: Deprecated since Aug 31 2016. Large collaborative knowledge base consisting of data composed mainly by its community members. Downloadable data dumps are still available.
  • Metaweb: Described as an “open, shared database of the world’s knowledge”, the company developed Freebase, was acquired by Google in 2010 and subsequently made most of the data available to Wikidata.
  • Cyc: Common sense knowledge base: vast quantities of fundamental human knowledge: facts, rules of thumb, and heuristics for reasoning about the objects and events of everyday life. Originated in 1984 by Douglas Lenat. Partial open-source version available through OpenCyc.
  • GDelt: Monitors the various news outlet from nearly every corner of every country and identifies the people, locations, organizations, events, etc, thus creating a free open platform for computing on the entire world. Supported by Google Jigsaw.
  • DBpedia: Open, free and comprehensive knowledge base constantly improved through a crowd-sourced community effort to extract structured information from Wikipedia.
  • YAGO: Semantic knowledge base from the Max-Planck Institute, derived from Wikipedia, WordNet, and GeoNames.
  • Wikidata: Project of the Wikimedia Foundation: a free, collaborative, multilingual, secondary database, collecting structured data to provide support for all other Wikimedia projects, and beyond.
  • LinkedIn’s Knowledge Graph: Built upon “entities” on LinkedIn, such as members, jobs, titles, skills, companies, geographical locations, schools, etc. forming an ontology of the professional world. Not available.
  • OpenIE: Quality information extraction at web scale; toolkit originating from the University of Washington.
  • PROSPERA: Hadoop-based scalable knowledge-harvesting engine which combines pattern-based gathering of relational fact candidates.
  • Google Knowledge Vault: Knowledge base created by Google.
  • ConceptNet: Originated from the crowdsourcing project Open Mind Common Sense, launched in 1999 at the MIT Media Lab, it is a freely-available semantic network.
  • WordNet: Nouns, verbs, adjectives and adverbs are organized into synonym sets, each representing one underlying lexical concept.

What’s the big difference between them? Time, money, domain, approach and supporting organization. At the end of the day, the challenges of extracting, disambiguating and linking entities of the world is an open problems. There is no one-size-fits-all solution that truly and automatically makes sense of the knowledge embedded within natural language with all of its subtleties. This led to considerable amount of exploration and specialization. After all, the struggle of making sense of information is one that every individual shares.

Part 2–? coming soon…

Read more…

Written by

Master of Layers, Protector of the Graph, Wielder of Knowledge. #OpenScience #NoBullshit

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store