Into The Depths Of The Graph World Using Neo4j
What is a Graph?
A graph consists of nodes, relationships and properties. A simple example is shown below.
Matt, Sarah etc. are all nodes with label as friends. These can be described as one type of entity. The posts that are posted by them in the social media are another type of entity. Similarly, “:Friends with” is a type of relationship which is describing how these entities- friends are connected/linked with each other. The third component of the graph is the property which is generally present in key-value pairs. They are present for both nodes as well relationships.
Let’s say the entity labelled as Friend has the key-value pair “name:Matt” and is considered as one of the properties of the Matt-node.
What is a Graph Database?
A Graph Database is a kind of database that uses these graph structures to store data. It can be compared to an electric grid network where all electrical points can be defined as nodes, and wiring from one point to another as relationship.
How is it different from relational databases?
The relational databases are all about tables and data here, is structured. In order to go and have links from one table to another, there is a need of operation called joins. The joins are computed at query time by matching primary and foreign keys of all rows in the connected tables. These operations are compute-heavy and memory-intensive and have an exponential cost. With the increase in number of tables and the type of relationship (one-to-many, many-to-many etc.), the queries get dirty and cumbersome.
As we probably can see the same example mentioned above can be created using 4 nodes (3 department nodes and 1 customer node) and 3 relationships. We also don’t need the concept of primary key or foreign key getting used.
What is Neo4j?
Neo4j is a highly scalable native graph database, built to leverage not only data but also data relationships.
The query language that Neo4j uses is called cypher. Just to showcase some of the cypher queries, I have taken the above graph as my point of reference.
Queries to Create Department nodes:-
MERGE (d:Department {name:”4FUTURE”, id:111})
MERGE (d1:Department {name:”P0815", id:119})
MERGE (d2:Department {name:”A42", id:181})
Queries to Create Customer nodes:-
MERGE(p:Customer {name:”Alice”, id:815}
Queries to Create Relationships between Department and Customer:-MATCH (p:Customer {name:”Alice”, id:815})
MATCH (d:Department {id:111})
MATCH (d1:Department {id:181})
MATCH (d2:Department {id:119})
WITH p,d,d1,d2 MERGE (p)-[:BELONGS_TO]->(d)
MERGE (p)-[:BELONGS_TO]->(d1)
MERGE (p)-[:BELONGS_TO]->(d2)
Neo4j has many use cases and is used world wide. Top retailers like eBay and Walmart rely on Neo4j to drive their recommendations, promotions and streamline logistics. Top insurers like Optum Healthcare and Allianz rely on Neo4j to fight fraud and manage information. It has become very popular among many databases that are present in the market.
The graph databases have their own pros and cons.
It is the best fit where many domain entities are interconnected and we want to explore on their relationships. The graph databases are not optimized for large-volume analytics queries typical of data warehousing. It is important to note that the query latency in a graph is proportional to how much part of the graph we choose to explore in a query, and not to the amount of data stored. So, to know when to go for graph databases becomes an important skill before starting a project.
References