Labeled vs Typed Property Graphs — All Graph Databases are not the same

Bryant Avey
Geek Culture
Published in
13 min readJun 14, 2021
Neo4j, JanusGraph, and TigerGraph logos

NNNeo4j, JanusGraph, and TigerGraph are the top 3 Native Property Graph Database Management Systems, according to DB-Engines ranking, but there are capability differences based on each vendor’s design decisions on how they implement graph data typing and labels. There are two distinct kinds of property graph databases: A Typed Property Graph, and a Labeled Property Graph. The purpose of this article is to discuss and bring some clarity on the underlying architecture of the Labeled Property Graph (LPG) and contrast some key differences with Typed Property Graphs. This article is not discussing the Resource Description Framework (RDF) graphs, which are Typed Graphs, but not Property Graphs.

It helps to start with some simple definitions of Typed and Labeled:

  • Typed: A type is a specific kind of data or object. In the graph world a type is a specific node or relationship (vertex or edge).
    - An object in a graph that is typed is unique and very similar in behavior to an object class in strongly typed programming languages.
    - Types are usually mandatory or required properties of an object.
  • Labeled: A label is text string that identifies a kind, set, or category of data or object. In the graph world, a label identifies a node or relationship (vertex or edge).
    - A labeled object in a graph is optional and does not have to be unique.
    - A labeled object is similar to a tag indicating some type of grouping or category.
    - Multiple labels can be applied to a graph object.

With the definitions out of the way, let’s get into the details. Of the top 3 native graph DBMSs, Neo4j is the only true Labeled Property Graph while JanusGraph and TigerGraph are Typed Property Graphs. The following table outlines how each of the 3 Graph DBMSs handle typed vs labeled properties.

Table showing typed vs labeled Nodes and relationships by DBMS

Neo4j has typed relationships and labeled nodes. Node labels in Neo4j are optional and a node can have multiple labels or no labels. Labels in Neo4j do not have to be unique between kinds or sets of nodes. Relationships in Neo4j are typed and are required, but of the three DBMSs only Neo4j allows non-unique infinite relationships to have the same type, even between completely different kinds or sets of nodes.

JanusGraph has optional labels on nodes, but if labels are not specified, a node is implicitly given a label by the database. Nodes in JanusGraph must be unique between different kinds of nodes. JanusGraph relationships are Labeled and require unique names between sets of nodes.

TigerGraph has required and unique types for both nodes and for relationships. Every node in TigerGraph must be a unique named type. Relationships in TigerGraph must have unique relationship name types between distinct sets of nodes.

Both JanusGraph and TigerGraph take an object-oriented class structure approach to the architecture of nodes and relationships. Just like in a strongly typed object-oriented class library of a programming language, all classes and object names have to be unique. This design architecture requires and forces schema into every named class. For example, if a node is named “person” then no other kind of node can be named “person”. This forces all entities named “person” to have the same data schema. In object-oriented languages this is known as a “strongly typed” language. Strongly typed languages have an advantage in programming and coding because it ensures that objects with the same name are all the same type of object with the same properties. This prevents bugs and coding errors.

JanusGraph and TigerGraph are essentially “strongly typed” graph database management systems because they both force unique names for relationship and nodes and ensure that a named relationship can only exist between two distinct node types. If a relationship exists between different types of nodes in TigerGraph or JanusGraph, a unique relationship name must be provided. Similarly, if a new structure for a node name is needed, it must be a new node type or “object class”.

Relationships (Edges)

Neo4j has a fundamentally different architecture. Although it has strongly typed relationships, it maintains untyped nodes that can have multiple labels. Because nodes in Neo4j are not strongly typed, there is no restriction for creating relationships between various node sets with different names. In fact, you could name every single relationship in a graph the same thing and Neo4j would handle it with no problems. For example, this graph shows the same relationship type between three different node types. Outcomes — EXPRESSES -> Concepts and Policies — EXPRESSES -> Concepts.

Arrows.app illustration of a graph using duplicate relationship types

Neo4j is the only property graph database management system that allows this type of graph model. Both JanusGraph and TigerGraph would require unique relationship names for both EXPRESSES relationships. If the EXPRESSES relationship has multiple properties those schema details would need to be duplicated between each of the unique relationship names in TigerGraph and JanusGraph.

The ability to duplicate relationship names within a graph schema provides a major advantage when using graph data science (GDS) algorithms and machine learning (ML) or artificial intelligence (AI) algorithms. A query of “Match (n1)-[r:EXPRESSES]-(n2) Return n1, r, n2” would return all objects that have an EXPRESSES relationship type. To do this type of query in any other graph database would require separate queries for each relationship with a UNION to get single result set. While this may not seem like a huge issue, it dramatically impacts a data science workflow, especially when multiple data sources are involved during the data ingestion pipeline. Tweaking and fiddling with GDS, ML and AI algorithms often involves dozens or hundreds of manual adjustments by a data scientist. The ability to quickly and easily get the data needed for the algorithms is tedious and meticulous. Having to ensure that each and every unique relationship name is included in a query can be very error prone. By allowing duplicate relationships names in relationships, Neo4j’s architecture enables data architects to ingest similar entities from multiple systems and relate them in the model using the same relationship types.

A common example where this design feature is particularly useful is in geographical graphs. Many types of nodes have an address. The ability to create a single relationship type called LOCATED_AT or LOCATES_IN to use between all nodes that have some type of address or geolocation allows data scientists to easily weight and assign significance to those types of relationships very easily. This is important when combining several types of graph analytics such as page ranking, similarity, nearness, and others. Neo4j is designed to do this type of discovery work whereas both TigerGraph and JanusGraph require several workarounds and a much longer series of queries to achieve comparable results.

Nodes (vertices)

Neo4j also has distinguished architectural advantages when it comes to the use of labels for nodes. Both JanusGraph and TigerGraph require unique name types for nodes. Although JanusGraph uses the term “labels” to apply to both nodes and relationships, they are not true labels but simply node names or relationships names…in other words, JanusGraph labels are actually type names. The restriction of unique node types for both JanusGraph and TigerGraph require all objects of the same name to maintain the same properties. In contrast, Neo4j allows a node to have multiple non-unique labels. This simple capability provides an enormous amount of flexibility when modeling graph schemas and ingesting billions of nodes from hundreds or thousands of sources in a data science environment. One of the advantages Labeled Property Graph (LPG) Databases have over traditional relational databases is the ability to create relationships between non-conformed data sets. In a classic star schema used by nearly all business intelligence tools, data has to be conformed and cleansed to be used. One huge advantage LPG databases have is that they can form relationships between anything, regardless of the level of granularity or whether the data is in conformance.

Let’s say for example, that we need to analyze customer data from three systems. A transactional ordering system, a sales system, and a marketing system. All three systems have similar but distinctly different customer schemas. Customers in all three systems have the customer name, but then each system stores other related details in different schema structures. The transactional system may store addresses in an address book table, while the sales system may store addresses in location branch. Likewise, the marketing system may store customer addresses directly in the customer table. Because both JanusGraph and TigerGraph require node names to have unique names, both graph DBMSs force you to decide how you want the schema before you create the graph and before you ingest the data into the graph. Customer and Address data from all three systems would have to be shaped and wrangled prior to ingesting them into either graph.

One major advantage LPGs provide is the ability to discover the unknown. However, with both JanusGraph and TigerGraph, the simple act of ingesting data into a graph requires several schema and data architecture decisions to be made before even starting the discovery or experimentation process in a data science workflow. This is where Neo4j provides a significant benefit. Because nodes in Neo4j are not strongly typed, and require no pre-defined schema before ingestion, both the customer and address information from all three systems can be pulled into Neo4j with no external data conformance, wrangling or data shaping. The original structures from each of the three systems stay intact and the different node types from each system can all have the same node label of “Customer”.

Arrows.app illustration of a graph using multiple labels, multiple schemas, and duplicate relationship types

This graph schema shows how multiple labels can be applied in Neo4j to capture the “Customer” from Sales, Marketing, and the TransactionalSystem by applying additional labels to nodes. Note also that it’s possible to differentiate schema models (PolicyModel vs CustomerModel) using labels in the graph as well. The blue nodes are the CustomerModel schema, and the orange nodes are the PolicyModel schema. Using labels, Neo4j allows multiple schemas to exist in single graph simultaneously. Not only can multiple schemas exist inside a Neo4j graph, but those schemas can interact with each other through shared relationship types.

Using the multiple labels and duplicate relationship names in Neo4j is very natural and easy with the Cypher Query Language.

To select the customers located at a location along with the policies affecting them, specify the singular “Customer” label and the desired match pattern.
MATCH p=(:Customer)-[]-() RETURN p

Graphileon illustration showing Cypher query results graph

The query result from the Cypher query shows nodes from both schema models as well as returning “Customer” nodes from all three systems: TransactionalSystem, Sales, and Marketing as well as the three LOCATED_AT relationships pointing to the customer “Address” node.

To select only data that’s related to the Sales System, specify the “:Sales” label.
MATCH p=(:Sales)-[]-() RETURN p

Graphileon illustration showing Cypher query results graph

The results of matching only the “Sales” node label yields the policy relationships from the PolicyModel schema as well as the related “Address” nodes.

To access the multiple LOCATED_AT relationship between all the various nodes just select the singular relationship name and all related node types will be returned.
MATCH p=()-[:LOCATED_AT]-() RETURN p

Graphileon illustration showing Cypher query results graph

Query results from the “LOCATED_AT” relationships returns all customers from all systems along with their relationship to the single address node label.

To access only the nodes in the Policy Model, you would write a Cypher query specifying the schema label.
MATCH (n1:PolicyModel)-[]-(n2:PolicyModel) RETURN n1, n2

Graphileon illustration showing Cypher query results graph

Isolating a single schema model in any query is as simple as just specifying the desired schema model. Selecting only the “PolicyModel” schema yields the full schema of just the Policy Model.

Likewise, to access only data in the Customer Model schema you would filter your matches to the corresponding CustomerModel label.
MATCH (n1:CustomerModel)-[]-(n2:CustomerModel) RETURN n1, n2

Graphileon illustration showing Cypher query results graph

When only the “CustomerModel” label is selected, the result is the full schema of the Customer Model nodes and relationships.

To see where the Customer and Policy model schemas interact, you filter the relationships you’re looking for to get those interactions.
MATCH (n1:CustomerModel)-[]-(n2:PolicyModel) RETURN n1,n2

Graphileon illustration showing Cypher query results graph

You can also select specific groups of labels using basic and/or logic depending what’s needed.
MATCH (n1)-[]-(n2) where “CustomerModel” in(labels(n1)) or “PolicyModel” in(labels(n1)) and “CustomerModel” in(labels(n2)) or “PolicyModel” in(labels(n2)) RETURN n1, n2

Graphileon illustration showing Cypher query results graph

When there are many schema models in a graph, the desired schemas can be specifically isolated using WHERE clauses to return only patterns utilizing the specified schema models.

Neo4j’s ability to use multiple labels, coupled with the ability to have the same relationship types exist between any node types, opens a huge world of analysis and data science experimentation and discovery. This is something that neither TigerGraph nor JanusGraph can easily do. From a data science workflow perspective, this gives data scientists and data engineers the ability to perform the bulk of their schema and data shaping work right in the graph database. TigerGraph for example forces you to have modeled a relationship and predefined schemas before even being able to query the graph or import data. Further illustrating the differences: Neo4j users can even create indexes on non-existent properties before an initial data load. Then as the nodes are loaded into the graph, they’re automatically indexed.

The fundamental difference in capability is that Neo4j provides an “Optional Schema” vs a “Schema First” approach to property graphs, which is made possible by the use of Labels in the property graph and by allowing relationship types to exist between multiple node types. The primary question to ask yourself as a data scientist is: Where would you rather do the bulk of your work? Is time better spent shaping and conforming data outside of a graph in external tools so you can finally import the data into a graph? Or is time better spent directly importing the data into a graph and then using the graph tools and algorithms to shape the data, create the schema, and instantly being analyzing it? For rapid discovery and insight, getting data into a graph is the best approach. Once your data is in a graph format, discovery of known and unknown relationships and patterns can begin. Neo4j’s design is ideally suited for iterative data science workflows where figuring things out and discovering what’s possible is what’s needed.

Neo4j’s Graph Data Science Library as well as their new inclusion of Machine Learning algorithms allows the data scientist to create mutable (changeable) graphs and graph schemas in virtual, re-runnable machine learning catalogs. Much of this is possible because of their approach to designing a true Labeled Property Graph vs a Typed Property Graph.

Wrapping it up

While it’s possible in both JanusGraph and TigerGraph to add additional properties such as a “SourceSystem” or “Model” that could be populated, properties don’t have the features and capabilities that labels have. Labels in Neo4j are automatically indexed. Labels can dynamically and automatically be applied using graph algorithms such as Label Propagation in Neo4j, making the ability to label nodes extremely powerful and flexible. Using Label Propagation algorithms, for example, new additional relationships could be created, or entirely new nodes could be created in Neo4j. TigerGraph prohibits this type of workflow due to its “schema first” approach. Before doing anything, the node schema and its property structures must first be created in design mode. Only then can a new node be created using the pre-existing schema. Our experience is that Neo4j is at least 10 times faster than TigerGraph at doing this type of iterative workflow. Depending on the schema changes performed, an entire graph re-load must be performed in TigerGraph, and all queries must be re-compiled and run. This is very time consuming in an iterative data science workflow taking between 10 and 30 minutes for every single schema change. Imagine having an idea and then having to wait 30 minutes before trying it out only to find it needs to be tweaked or doesn’t work how you wanted. Having to repeat those steps dozens or hundreds of times is very frustrating to the data scientist or engineer. With every iteration in TigerGraph there’s an average 20-minute wait time in between making the change and getting everything to compile so it can be used.

Typed graphs have their advantages in raw speed of pre-compiled queries, but this speed comes at a huge price of inflexibility in the data science workflow and data ingestion pipelines. Typed graphs also have an advantage in standardizing API’s. TigerGraph has a great REST API. Since everything in TigerGraph is a strongly typed object, the REST API makes every object in the database a REST endpoint. Every node, relationship, and query in TigerGraph has a REST API endpoint, making it a fantastic choice for application development. The tradeoff for this is the heavy front end work that has to be engineered into the database to completely design and architect the database prior to use.

As a Labeled Property Graph, Neo4j has the algorithmic basis and the schema flexibility needed for an iterative and experimental data science environment. Graphs are tremendous tools for discovery and finding undiscovered patterns in data, and Neo4j empowers this capability by allowing duplicate relationship names and labels that can be applied to multiple nodes. Labels are yet another tool the data science toolkit that can be used to visualize and explore data naturally. Labels can be used to separate out similar data from multiple systems, and even designate entire schemas. Although not covered in this article, labels can also be used to more easily visualize the “hairball”, “snowstorm”, and “starburst” graph patterns, where relationships are so dense, they’re impossible to visualize.

While this article only scratches the surface of what can be done in a Labeled Property Graph (LPG), it hopefully lifts the hood a bit to see how the design decisions of a graph DBMS architecture make a huge impact on capability of the database.

--

--

Bryant Avey
Geek Culture

Bryant is a Solution Architect at InterNuntius, a data integration and analytics consultancy. He is also the Chief Data Officer for Stratalytica, a nonprofit.