The legacy of knowledge graphs

Jan Tschada
Geospatial Intelligence
9 min readApr 18, 2023

A knowledge graph is a powerful capability for organizing and representing complex information in a structured and interconnected way. The concept of knowledge graphs dates back several decades, with former work on semantic networks and ontology modeling in the fields of artificial intelligence and knowledge representation.

The development of the Semantic Web and Linked Open Data initiatives spurred renewed interest in knowledge graph technologies, leading to the creation of large-scale knowledge graphs like Wikidata, DBpedia and Freebase. We use knowledge graphs in a wide range of applications, from search engines and recommender systems to natural language processing and machine learning. The continued growth and development of knowledge graphs promise to unlock new insights and capabilities across a range of industries and domains.

The Königsberg bridge problem

Königsberg bridge problem

Leonhard Euler was a Swiss mathematician who made important contributions to a wide range of fields, including calculus, number theory, and graph theory. One of his most famous contributions to graph theory was his solution to the Königsberg bridge problem.

The Königsberg bridge problem was a puzzle that had been posed by the citizens of the city of Königsberg (now Kaliningrad, Russia) in the 18th century. Unfortunately, the Pregel river divided the city of Königsberg into two main sections. The river also contained two large islands, connected to each other and the two main sections of the city by seven bridges.

The challenge was to find a route through the city that would cross each bridge exactly once and return to the starting point. Many people had tried to solve the puzzle, but none had succeeded.

Euler realized that the problem could be abstracted and represented as a mathematical graph, with the land areas as vertices and the bridges as edges. Using this approach, he was able to prove that it was impossible to find a route that crossed each bridge exactly once and returned to the starting point. He showed that this was because there were four vertices with an odd number of edges connected to them, which made it impossible to create a closed path.

Euler’s solution to the Königsberg bridge problem was a significant contribution to the field of graph theory and helped to establish it as a field of mathematics in its own right.

Knowledge Graphs a.k.a. Kennis Grafieken in the Netherlands

Modern Antique Map of the Netherlands

Knowledge graphs, as we know them today, were not invented in the 1980s in the Netherlands. However, the development of knowledge representation and reasoning technologies during that time did play an important role in laying the foundation for the current generation of knowledge graphs.

In the 1980s, there was a growing interest in the use of symbolic AI techniques for representing and manipulating knowledge in computer systems. The Netherlands played a significant role in this development, with researchers at the University of Amsterdam and other institutions exploring the use of semantic networks and other knowledge representation formalisms.

One of the key figures in this work was Bob Wielinga, who developed the Knowledge Acquisition and Documentation Structuring (KADS) technique for knowledge engineering. KADS emphasized the importance of domain modeling and the use of structured knowledge representation to support reasoning and decision-making in expert systems.

Other researchers in the Netherlands, such as Frank van Harmelen, Guus Schreiber, and Ian Horrocks, continued to build on this work, developing new knowledge representation formalisms and inference algorithms that would later form the basis for the Semantic Web and the development of modern knowledge graphs.

We need a standard description framework

The Resource Description Framework (RDF) is a standard for modeling and representing metadata about resources on the web. RDF provides a way to describe the characteristics and relationships of resources, such as web pages, digital images, and online documents, using a set of standardized syntax and semantics.

At its core, RDF is a graph-based data model, in which resources are represented as nodes in a graph and their relationships are represented as edges or arcs between the nodes. RDF uses a simple triple format to express these relationships, dwelling of a subject, a predicate, and an object, which together form a statement or assertion about a resource.

For example, the statement “The sky is blue” could be represented in RDF as a triple dwelling of the subject “sky”, the predicate “has color”, and the object “blue”. This triple could be further expanded to include additional information, such as the context or provenance of the assertion.

RDF is an important technology for the Semantic Web, a vision for a web of linked and interoperable data that can be easily understood and processed by machines. It provides a standardized way to represent and exchange data on the web, enabling applications to reason about and integrate data from multiple sources.

The Semantic Web vision

The Semantic Web, as originally proposed in 2001 by Sir Tim Berners-Lee, is a vision for a web of linked data that machines can easily understand and process. The Semantic Web aims to extend the current web, which is primarily designed for human consumption, by adding a layer of semantic meaning to the content on the web.

The heart of the Semantic Web is the idea of using standardized data formats and ontologies to describe the relationships between resources on the web. By adding metadata and structured data to web pages, databases, and other online resources, the Semantic Web enables applications to understand and reason about the content of the web, creating a more intelligent and interconnected web of data.

The Semantic Web was seen as a way to address the limitations of the legacy web, such as the inability of search engines to understand the meaning of web pages and the difficulty of integrating data from different sources. The vision of the Semantic Web has inspired the development of a range of technologies and standards which are used to model, store, and query data on the web.

While we have not fully realized the original vision of the Semantic Web, we have applied many of the technologies and principles developed for the Semantic Web in a range of fields, such as knowledge representation, data integration, and artificial intelligence. Today, the Semantic Web continues to be an important area of research and development, with the potential to enable fresh forms of intelligent and data-driven applications.

The Linked Open Data revolution

The DBpedia and Wikidata projects were groundbreaking efforts in the early development of the Linked Open Data movement. These projects aimed to create large, structured knowledge graphs that applications and researchers could access and reuse across the web.

DBpedia is a community-driven effort to extract structured data from Wikipedia and make it available as RDF on the web. By using automated methods to extract information from Wikipedia articles, DBpedia created a large-scale knowledge graph that covered a wide range of topics and domains.

Wikidata, on the other hand, is a project initiated by the Wikimedia Foundation to create a centralized, collaboratively edited knowledge graph that all Wikimedia projects could use, including Wikipedia. Wikidata provided a structured database of statements about entities, such as people, places, and concepts, that could be accessed and queried using a standardized query language (SPARQL).

Together, these projects played a crucial role in demonstrating the potential of LOD for creating large-scale, interconnected datasets on the web. By providing a wealth of structured data that could be accessed and queried by machines, DBpedia and Wikidata helped to inspire the development of new LOD technologies and applications.

Furthermore, these projects also showed the power of collaborative efforts in creating and maintaining large knowledge graphs. The success of DBpedia and Wikidata has led to the development of many other community-driven projects that aim to create and share structured data on the web, further advancing the vision of the Semantic Web and Linked Open Data.

The Graph Query Language Manifesto

The Graph Query Language (GQL) Manifesto is a document that outlines the principles and requirements for a standardized query language for graph databases. The manifesto was created by a group of experts in the field of graph databases and was first advanced in 2018.

The manifesto defines the key features and capabilities that a graph query language should have, including support for graph traversal, pattern matching, filtering, aggregation, and graph analytics. It also emphasizes the importance of interoperability, extensibility, and ease of use in the design of a graph query language.

The GQL Manifesto argues that a standardized graph query language is essential for enabling the widespread adoption and use of graph databases in various applications, including data management, analytics, and machine learning. By providing a common language for interacting with graph data, GQL can facilitate the exchange and integration of data across different graph database systems and applications.

The GQL Manifesto also highlights the need for collaboration and community involvement in the development of a standardized graph query language. It calls for open standards development processes, as well as community feedback and participation, to ensure that the language meets the needs of a broad range of users and use cases.

Geospatial Knowledge Graphs being Know-Where

Know-Where graphs are a concept that involves fusing a knowledge graph with geo-enrichment to create a linked data platform that can be directly accessed and used within Geographic Information Systems (GIS) and other environments. This approach enables users to easily combine location-based data with semantic knowledge, allowing them to gain new insights and make more informed decisions based on spatial and semantic relationships.

ArcGIS provides a powerful suite of tools for analyzing and visualizing spatial data, but it took some time natively supporting semantic data or knowledge graphs. With ArcGIS Knowledge, users can leverage the power of a knowledge graph to enhance their spatial analysis and decision-making workflows.

It enables users to query and visualize data from the knowledge graph directly within the GIS environment. This allows users to easily explore and analyze the spatial and semantic relationships between discrete entities, such as locations, events, and organizations. For example, a user could query the knowledge graph to find all the historical sites associated with a particular region, and then use ArcGIS to visualize these sites on a map and perform further spatial analysis.

This semantic concept represents an important development in the field of geospatial data analysis, by enabling users to seamlessly integrate knowledge graphs and spatial data within a single environment. The integration with ArcGIS is an example of how this concept can be applied in practice.

The geospatial representation of Dessau-Roßlau in OSM
The semantic entity representation of Dessau-Roßlau in Wikidata

OpenStreetMap (OSM) and Wikidata are two open data projects that are closely related and have been used together in various applications. Both projects have developed methods for referencing each other’s data to create more comprehensive datasets.

One way that OSM and Wikidata reference each other is through the use of shared identifiers, such as OpenStreetMap IDs (OSM IDs) and Wikidata IDs (QIDs). These identifiers are used to link data between the two projects, enabling users to query and visualize data from both sources.

For example, OSM includes geographic data such as street maps and building footprints, while Wikidata includes structured data about entities such as landmarks, museums, and historical sites. By linking these datasets through shared identifiers, users can create more detailed and informative maps and visualizations that incorporate both spatial and semantic data.

The combination of OSM and Wikidata enables rich and interconnected datasets that can be used for a wide range of applications, including mapping, geospatial analysis, and semantic search. The ability to reference each other’s data through shared identifiers and tools creates a more comprehensive and accurate representation of the world, and facilitates collaboration and knowledge-sharing among users.

Summary

Everyone should leverage the combination of spatially enabled knowledge graphs to gain insights because it enables the blending of any tabular data and documents, as well as non-spatial analysis of networks described by a knowledge graph, with spatial and temporal data and analysis.

Users can gain a deeper understanding of the relationships between entities, events, and locations, and can make more informed decisions based on this knowledge. This approach allows for a more comprehensive analysis of complex systems, and can lead to new insights and opportunities for innovation. Ultimately, by leveraging spatially enabled knowledge graphs, users can unlock the full potential of their data and gain a competitive advantage in their field.

If the story started with a geospatial use case in Königsberg, we should show more love for geospatial knowledge graph use cases in the near future.

--

--