Integrate Neo4j with Jupyter notebook

Learn how to connect Neo4j database to your jupyter notebook for data analysis

Data Technology
5 min readMar 26, 2020

Hello, So you here because you wanted to know how you can connect Neo4j into your jupyter notebook so you can directly do analysis over the graph.Don’t worry ,at the end of this tutorial you will get familiar with it .
So, Lets begin the journey.

What we will cover:

  1. How to connect jupyter notebook with the running graph in the Neo4j Server.
  2. Access nodes and relationships and perform operations over them.
  3. How to run simple Cypher queries and retrieve results in Dataframe.
  4. How to add new Nodes and relationship from our jupyter notebook to the graph in Neo4j.

What is Neo4j ?

Neo4j is the next generation open-source, NoSQL , native graph database that is built to leverage not only data but also data relationships.

Characteristics of Neo4j

Neo4j also provides full database characteristics, including ACID transaction compliance, cluster support, and runtime failover — making it suitable to use graphs for data in production scenarios.

Building Blocks of Neo4j

Setting up the Environment

  1. Neo4j : Make sure you have installed the Neo4j Database.Open it and start the Movie Graph Database. This comes built in with the Neo4j Database.
    Follow this link for setting up the Neo4j :https://neo4j.com/
  2. py2neo : We will use py2neo library for connecting the jupyter notebook to the Neo4j Server .It is a very simple library that you can use to connect your python application with the Neo4j database. Run “pip install py2neofrom your shell to install this library.
    For more information ,check this :https://py2neo.org/v4/index.html

Now we are ready with our environment. Lets do the tasks:

1. Connect jupyter notebook with the Movie graph in the Neo4j Server.

First , import the py2neo library in your notebook :

from py2neo import Graph,Node,Relationship

Then , connect your program with the graph.

You need three things here :
a)Database URL : URL with the port number on which your neo4j server is running.
b)Username : Username of your database
c)Password : Password of your database

graph = Graph(<Database URL>, auth=(<Username>, <Password>))

Here, Graph is a class that represents a particular Neo4j graph database.You will use the instance of Graph to interact with the Neo4j database for various operations.

Take a look at the Schema of the Movie Graph that we will use in our example

There are 2 entities here:
a)Person and b) Movie

There are many relationships here such as DIRECTED,ACTED_IN,REVIEWD,PRODUCED etc..

2. Access nodes and relationships and perform operations over them

Suppose we wanted to know how many Person nodes and Movie nodes are there in my Graph

#Cypher Query
number_of_person_nodes=”MATCH(p:Person) RETURN Count(p)”
number_of_movie_nodes=”MATCH(m:Movie) RETURN Count(m)”
#Evaluate the Cypher query
result_persons=graph.evaluate(number_of_person_nodes)
result_movies=graph.evaluate(number_of_movie_nodes)
#Print the result
print(f”No of person node is {result_persons} & No of movie node is {result_movies}”)

Result is : No of person node is 133 & No of movie node is 38

We have simply given Cypher Queries to our Graph object and with the help of evaluate( ) method it is giving us the count of every node.

Lets see the visualization of the above data

df_result_count=pd.Series({‘Person’:result_persons,’Movie’:result_movies})
df_result_count.plot(kind=’bar’,color=[‘blue’,’darkorange’])
plt.xlabel(‘Node Label’)
plt.ylabel(‘No Of Nodes’)

3. How to run simple Cypher queries and retrieve results in Dataframe

  • Get all the relationship type in our graph and convert the result into dataframe
cypher_all_relationship=”MATCH (n)-[r]-(m) RETURN DISTINCT type(r) as RelationType”
graph.run(cypher_all_relationship).to_data_frame()

run( ) method is used to run cypher statements and return the result for navigation.We are then converting this cursor into our Pandas dataframe using to_data_frame() method

  • Get all the actors along with the movie name in which they have acted in and convert the result into dataframe
#Cypher to fetch the actors along with their movie
all_actors_name=”MATCH (p:Person)-[rel:ACTED_IN]->(m:Movie) RETURN DISTINCT p.name as PersonName,m.title as MovieName ORDER BY PersonName ASC”
#Run the above Cypher
graph.run(all_actors_name).to_data_frame()

4. Add new Nodes and relationship from our jupyter notebook to the graph in Neo4j

Create the new nodes as Person and Movie. Then connect them using ACTED_IN Relationship.

#Create a new node for actor
new_person = Node(“Person”,name=”Aamir Khan”,born=’1965')
new_person
#Create a new node for movie
new_movie=Node(“Movie”,title=”3 Idiots”,released=’2009')
new_movie
#Now insert both the nodes in the graph
tx=graph.begin() #Get the transaction object which will perform this transaction in Graph
tx.create(new_person)
tx.create(new_movie)
#Create the relationship between above two nodes
tx.create(Relationship(new_person, “ACTED_IN”, new_movie))
tx.commit() #Commit the result in the Graph

Lets check our result . Fetch all the actors along with the movies in which they have acted.We will use the same query as we have done earlier to fetch all the actors along with their movie.

#Cypher to fetch the actors along with their movie
all_actors_name=”MATCH (p:Person)-[rel:ACTED_IN]->(m:Movie) RETURN DISTINCT p.name as PersonName,m.title as MovieName ORDER BY PersonName ASC”
#Run the Cypher
graph.run(all_actors_name).to_data_frame()

See the first row , we have a new actor along with a new movie in which he has acted in.

Reference Links

  1. Cypher Manual Reference
  2. Visualizing the graph

I hope this article was helpful to you in getting to know how you can connect both jupyter notebook & Neo4j.
If you liked this article ,share with your friends. Comment down your thoughts, opinions and feedback below. I would love to hear from you.

Don’t forget to give a clap.

Happy Learning !!!

--

--