Graph Databases: Talking about your Data Relationships with Python

Nicolle Cysneiros
Apr 17, 2017 · 11 min read
Image for post
Image for post
Image for post
Image for post
Table with users and relationship between users (Friends_with)
Image for post
Image for post
Table with pages and relationship between user and pages (Likes)
Image for post
Image for post
Social Network scenario represented as graph

Graph Database

Graph Database is a system that stores data in a graph structure and allows the execution of more semantic queries, directly retrieving related data. This type of database not only improves the representation of relationships, but also lets us use more elaborated data analysis techniques, such as community detection, pattern recognition and centrality measures. Another advantage of Graph Databases is the flexibility in the data schema. While Relational Databases requires new tables or alterations in the existing ones to add new types of data, in a Graph Database we can add new type of vertices and edges without alterations in the previously stored data. For instance, the image below shows how the graph would look like if we add the concept of “Discussion Groups” to our social network app.

Image for post
Image for post
Social Network scenario including the concept of “Discussion Groups”

Neo4J

Neo4J is the first and most popular native Graph Database according to DB Engines website. It is an open source project implemented in Java and it has its own query language called Cypher. The community is quite active in Github and Stackoverflow, in addition to available blog posts and ebooks. The access to Neo4J database can be done using the Java API or using the RESTful API.

REST API Example

The access to the REST API can be done through Neo4J browser, submitting Cypher queries. Some examples showing how to create a new vertex (node), create a new relationship and submitting a query are listed in the following sessions.

Cypher command to create a new node
Image for post
Image for post
Graphic result for node creation command
JSON result for node creation command
Cypher command to create a new relationship
Image for post
Image for post
Graphic result for relationship creation command
JSON result for relationship creation command
Image for post
Image for post
Graphic representation of the network stored in the database
Cypher query to retrieve pages the user John likes
Image for post
Image for post
Graphic result of the query
JSON result of the query

Python Example

In addition to accessing the database using the REST API and the Java API, it is also possible to integrate Neo4J with a Python application using Py2Neo module. This module supports Python 2 and 3 and it allows the submission of Cypher queries to the database.

Python code to create nodes and edges
Python code to run a query
>>>
(e0f611c:Page {category:"Músico/Banda",name:"The Beatles"})
(ac6964f:Page {category:"Comida/Bebida",name:"Coca-Cola"})

Comparing Graph DBs

Besides Neo4J, there are other popular graph databases according to DB Engine site.

Comparing Queries

For this experiment, the same query (what pages John likes) was written in Cypher (Neo4J), extended SQL (OrientDB) and Gremlin (TitanDB)

Cypher query example (Neo4J)
SQL Extended query example (OrientDB)
Gremlin query example (TitanDB)

Comparing Performance

One of the basic tasks when manipulating a database is to retrieve an object given its id. The following experiment records the average time to retrieve a node by its id, given an 500000 nodes graph and 4 clients executing this action 200 times. This process was executed using Neo4J, OrientDB and TitanDB using Cassandra as backend.

Image for post
Image for post
Chart with mean time that each DB took to retrieve a node by its id
Image for post
Image for post
Chart with the amount of memory required for each DB

Comparing Neo4J with Relational Databases

It is possible to notice the main difference between relational and graph databases regarding information retrieval, when we compare how the same query is written for each system. Consider a query that returns all users that like a page “The Beatles”.

SQL query example
Cypher query example

Comparing Performance

Relational and graph DBs are suitable for different purposes depending on the application. If the application requires the representation and access to relationship between data, graph database is the way to go. However, if the application is only interested in the information stored inside an entity, relational database can perform better in this situation. An experiment was made with the objective to verify how long does it take to Neo4J and MySQL (RDBMS) to execute two types of query:

  • Data Query: count number of nodes/instances that has a certain attribute with value below a given threshold.
Image for post
Image for post
Chart comparing Neo4J and MySQL performance to execute structural and data queries

Applications

There are several areas where the use of graph database is relevant due to the nature of the data. One of these areas is Social Network, such as the running example of this post. Other important area is Bioinformatics and Genetic Analysis, where the interactions between particles and molecules are better represented as a graph. Telecommunications also uses graph databases to store information about connections between devices and to analyse areas in the network that may need a reinforcement in the infrastructure.

Image for post
Image for post
Resulting graph after analysis process
Cypher command that calculates betweenness centrality for each character
Python code that executes cluster detection method available in iGraph
Image for post
Image for post
Cluster detection algorithm result

Labcodes Software Studio

Blog about technologies that we find interesting and our…

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch

Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore

Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store