Understanding Graph Types and Ontological-Driven Data Structures

Joe Hoeller
7 min readJun 26, 2024

--

Hello wonderful people, I’ve noticed some common misconceptions about graphs and AI circulating on LinkedIn, and I thought it might be helpful to share some insights to clarify these topics. I hope you find this information enlightening and that it assists you in effectively building complex AI systems with enhanced reasoning capabilities. Looking forward to an engaging discussion with you all — my LinkedIn.

Graphs are essential in various domains, ranging from computer science to bioinformatics. However, distinguishing between different types of graphs and understanding their unique properties and applications is crucial. This article aims to clarify these distinctions by focusing on Directed Acyclic Graphs (DAGs), Label Property Graphs (LPGs), Bipartate Graphs, and Semantic Knowledge Graphs, which are ontologically-driven data structures, particularly in the context of spatial reasoning.

Directed Acyclic Graphs (DAGs) are Flowcharts (essentially)

A flowchart is a type of diagram that represents a process or workflow, often visualized as a Directed Acyclic Graph (DAG). A DAG is a graph that is directed and contains no cycles, meaning there is a one-way path from one node to another without any loops, hence “acyclical”. This makes DAGs suitable for representing processes with clear linear progression, such as tasks in a workflow or steps in a computational process.

Flowcharts are a common way to visualize DAGs, but it’s important to note that DAGs themselves are not knowledge graphs. DAGs are utilized in various algorithms, such as topological sorting and combinatorial enumeration, to solve problems in scheduling, steps in data processing, and other similar tasks. The primary focus of DAGs is on the order and dependency of tasks, not on semantic relationships or high-level reasoning.

Semantic Knowledge Graphs

Semantic knowledge graphs, on the other hand, are designed to represent knowledge in a way that enables reasoning and inference. These graphs contain nodes representing entities and edges representing relationships, enriched with semantic information. This allows for high-level queries and reasoning that go beyond simple data retrieval.

A key feature of semantic knowledge graphs is the use of data predicates, which are elements that describe the relationships between entities. These predicates enable the graph to connect detailed information to broader concepts, allowing for sophisticated queries and inference. Unlike DAGs, semantic knowledge graphs employ ontologies, which are structured frameworks for organizing information. Ontologies define the types of entities, relationships, and rules for reasoning, and connect data points across domains and inter-connect within sub-domains, facilitating a deeper understanding of the data.

Label Property Graphs (LPGs)

Are a data structure that represents data and it’s relationships. In LPGs, both nodes and edges can hold various properties, enabling a detailed depiction of entities and their interconnections within the network. This attribute-rich structure not only allows for straightforward queries but also supports complex analyses, think SQL on steroids, making LPGs ideal for applications requiring an in-depth exploration of data relationships. Despite their robustness, the representation capabilities of LPGs are bound by their structure, which can pose challenges in scenarios that require representing beyond simple binary relationships, which does not form the basis for contextual reasoning.

Bipartite Graphs

Bipartite graphs are a special type of graph where the set of vertices can be divided into two disjoint subsets such that no two vertices within the same subset are adjacent. This means that every edge in the graph connects a vertex in one subset to a vertex in the other subset. Bipartite graphs are particularly useful in modeling relationships between two different classes of objects, such as in job assignments, where one subset represents workers and the other subset represents tasks, with edges indicating which worker is assigned to which task. These graphs are crucial in solving matching problems, optimizing network flows, and are foundational in the design of algorithms for bipartite matching, such as the Hopcroft-Karp algorithm. The structure of bipartite graphs simplifies many computational problems, making them a powerful tool in many practical applications.

RDF (Resource Description Framework) Triples

RDF Triples, although not a graph, are crucial elements of the Resource Description Framework, play a key role in structuring data for the Semantic Web. Structured as a subject, predicate, and object, each RDF Triple articulates a specific statement about a resource, linking data in a meaningful way. The subject represents the resource in question, the predicate denotes the type of relationship that connects the subject to the object, and the object itself can be either another resource or a literal value. This structured approach enables the creation of ontologies — formal representations of knowledge within a domain. RDF Triples thus provide the foundation for constructing ontologies by defining and interlinking concepts through precise, semantic relationships, facilitating sophisticated data retrieval and reasoning across diverse information systems or domains — think of it as being able to connect information across several Subject Matter Experts (SME’s) at once.

  1. Semantic Relationships: RDF triples express relationships using URIs (Uniform Resource Identifiers) that are globally unique, which helps in defining clear and unambiguous connections between different data items. This global identification system ensures that the same term used in different datasets refers back to the same concept, making it easier to merge and query data across various sources.
  2. Flexible Schema: RDF does not require a fixed schema before data is created or integrated. This flexibility allows RDF to adapt to various domains and integrate new types of data without restructuring existing databases. This adaptability is crucial for connecting data points that originate in different systems and structures.
  3. Standardized Query Language (SPARQL): RDF data can be queried using SPARQL, a powerful and standardized query language specifically designed to retrieve and manipulate data stored in RDF format. SPARQL can perform sophisticated queries that span multiple data sources, even if those sources are spread across the internet, thus facilitating complex reasoning and analysis across domains.
  4. Ontologies Provide Inference: RDF is often used in conjunction with ontologies (formalized vocabularies of domain knowledge). Ontologies define structured relationships and hierarchies between concepts which can be used for logical inference. Using RDF in ontologies allows systems to infer additional information from the explicit data, enabling more profound contextual understanding and reasoning. For instance, if an ontology states that “all doctors are healthcare professionals,” and an RDF triple states that “Jane is a doctor,” a system can infer that “Jane is a healthcare professional” even if this is not explicitly stated.
  5. Interoperability and Integration: RDF’s framework is designed to support interoperability, making it easier to connect and integrate data from diverse domains. The RDF format ensures that data can be understood and utilized in the same way across different systems, reducing the barriers typically encountered in cross-domain data usage.

If you want to dive into the very basics of ontology creation, check my other article here.

Unlocking Graph Structures: The Mathematical Formulas That Map Graph Networks to it’s Purpose

Linear Processes for DAGs:Vertices and Edges with no cycles: G=(V,E) Topological Sort: V1 ,V2 ,…,Vn where for each (Vi ,Vj )∈E,i<j Linear relationship: This denotes a linear, directed path suitable for representing processes and workflows without cycles. Ontologies in Semantic Knowledge Graphs: Triples representing relationships: T = \{(s, p, o) mid s, p, o \in URI \text{ or } Literal } T={(s,p,o)∣s,p,o∈URI or Literal} Ontologies connecting data points across domains: O={C,R} C are concepts and

Key Formulas for Contextual Reasoning (or not)

Linear Processes for DAGs

  1. Vertices and Edges with no cycles:

2. Topological Sort:

3. Linear relationship:

Ontology driven Semantic Knowledge Graphs

  1. Triples representing relationships:

2. Ontologies connecting data points across domains:

Where C are concepts and R are relationships. These ontologies enable complex, interconnected data representations, supporting advanced semantic queries and cross-domain reasoning.

Label Property Graphs (LPG’s) are bonded by binary relationships

  1. Vertices and edges with properties:

2. Edges with properties:

This represents linear, multiple binary relationships, suitable for applications like Named Entity Recognition (NER) models, where each entity and relationship has specific properties but does not support complex, nor connect disparate data points across domains for contextual semantic reasoning.

Maximum Matching in Bipartite Graphs

  1. Maximum matching:

This represents the largest possible set of edges where no two edges share a common vertex, which is critical for optimizing assignments for “best fit” and/or resource allocation.

Takeaway: It’s crucial to approach these concepts with an understanding of their distinctions. Graph algorithms, such as those used for DAGs, bipartite graphs, and others, are well-established in graph theory. Each graph type has its specific mathematical properties and formulas, which include combinatorial enumeration and topological ordering. In the case of a semantic knowledge graph, there are additional elements like data predicates that enable high-level reasoning and connect to detailed solutions or end states.

While graph representations of workflows and Named Entity Recognition (NER) models are valuable and have specific use cases, it is essential to recognize their correct implementation and application. Misunderstanding these concepts can lead to incorrect conclusions about their functions and purposes.

I”I hope this helps you on your journey in AI leveraging graphs, thanks for reading!”

--

--

Joe Hoeller

Computer Vision Engineer and Accelerated GPU Computing Expert