nxneo4j: NetworkX-API for Neo4j — A new chapter

Yusuf Baktir, Ph.D.
Neo4j Developer Blog
7 min readSep 2, 2020
Photo: Clint Adair

Recently, I have had the opportunity to work with nxneo4j and I am excited to share it with the world!

What is nxneo4j?

nxneo4j is a python library that enables you to use networkX type of commands to interact with Neo4j.

Neo4j is the most common graph database. NetworkX is the most commonly used graph library. Combining the two brings us the nxneo4j!

Neo4j has the following advantages:

  • Neo4j is a database which means it persists the data. You can create the data once and run graph algorithms as many times as you want. In networkX, you need to construct the graph every time you want to run it.
  • Neo4j graph algorithms are scalable and production-ready. Neo4j algorithms are written in Java and performance tested. NetworkX is a single node implementation of a graph written in Python.
  • The response time is much faster in Neo4j.
  • Neo4j supports graph embeddings in the form of Node Embeddings, Random Projections, and Graph Sage. These are not available in nxneo4j yet but it will be available in the future versions.

So, why not use Neo4j?

nxneo4j is designed to help users interact with Neo4j quickly and easily. It uses the famous networkX API. You will use your well-accustomed scripts and but the scripts will run against Neo4j. Cool!

Just to be clear, Mark Needham had already created the nxneo4j and you might have used it in the past. This version updates the entire library for Neo4j 4.x and new Graph Data Science library since the older Graph Algoritm library is not supported with Neo4j 4.x. More importantly, it significantly improves the core functionality with property support, node and edge views, remove node features etc. So, this is more a like “A new chapter” or “Welcome back” for the library and it will have continuous support.

If you are like me and prefer Jupyter Notebooks instead, here is the link:

https://github.com/ybaktir/networkx-neo4j/blob/master/examples/nxneo4j_tutorial_latest.ipynb

Prerequisite 1: Neo4j itself

You need to have an active Neo4j 4.x running. Make sure you have Neo4j 4 and above. You have four options here:

  1. Neo4j Desktop: It is a free desktop application that runs locally on your computer. It is super easy to install. Follow the instructions on this page: https://neo4j.com/download/
  2. Neo4j Sandbox: This is a free temporary Neo4j instance and it is the fastest way to get started with Neo4j. So fast that, you can have your Neo4j instance running under 60 seconds. Just use the following link: https://neo4j.com/sandbox/
  3. Neo4j Aura: This a pay-as-go cloud service. You can start as low as $0.09/hour. This is most suitable for long term projects where you don’t have to worry about the infrastructure. In case you want to give it a try: https://neo4j.com/aura/
  4. Your enterprise Neo4j instance. You know it when you have it. Otherwise, you can ask your architect to run an instance for you. Be careful during the experimentation.

Prerequisite 2: APOC and GDS plugins

In Neo4j Desktop, you can easily install them like the following:

Image by the Author
Image by the Author

The libraries come pre-installed in the Sandbox and Aura.

Connect to Neo4j

No matter which option you choose, you need to connect to the Neo4j. The library to use is “neo4j”

pip install neo4j

Then, connect to the Neo4j instance.

from neo4j import GraphDatabaseuri      = "bolt://localhost" # in Neo4j Desktop
# custom URL for Sandbox or Aura
user = "neo4j" # your user name
# default is always "neo4j"
# unless you have changed it.
password = your_neo4j_password
driver = GraphDatabase.driver(uri=uri,auth=(uri,password))

If everything went smoothly so far, you are ready to use nxneo4j!

nxneo4j

To get the most up to date version, install it directly from the Github page.

pip install git+https://github.com/ybaktir/networkx-neo4j

This will install nxneo4j 0.0.3. The version 0.0.2 is available on pypi but it is not stable. 0.0.3 will be published on pypi soon. Until then, please use the above link.

Then create the Graph instance:

import nxneo4j as nxG = nx.Graph(driver)   # undirected graph
G = nx.DiGraph(driver) # directed graph

Let’s add some data:

G.add_node(1)                   #single nodeG.add_nodes_from([2,3,4])       #multiple nodesG.add_edge(1,2)                 #single edgeG.add_edges_from([(2,3),(3,4)]) #multiple edges

Check nodes and edges:

>>> list(G.nodes())
[1, 2, 3, 4]
>>> list(G.edges())
[(1, 2), (2, 3), (3, 4)]

To add nodes and edges with features:

G.add_node('Mike',gender='M',age=17)
G.add_edge('Mike','Jenny',type='friends',weight=3)

Check individual nodes data:

>>> G.nodes[‘Mike’]
{'gender': 'M', 'age': 17}
>>> G.nodes['Mike']['gender']
'M'

Check all nodes and edges data:

>>> list(G.nodes(data=True))
[(1, {}),
(2, {}),
(3, {}),
(4, {}),
('Mike', {'gender': 'M', 'age': 17}),
('Jenny', {})]
>>> list(G.edges(data=True))
[(1, 2, {}),
(2, 3, {}),
(3, 4, {}),
('Mike', 'Jenny', {'type': 'friends', 'weight': 3})]

Visualize with like the following:

>>> nx.draw(G)
Image by the Author

To delete all the data

G.delete_all()

Config file

Neo4j has some additional requirements for data storage. In Neo4j, the relationships have to have a relationship label. The labels of the nodes are highly recommended. Since the NetworkX syntax has no room for label modification, we store this knowledge in the “config” file.

The config file is a python dictionary, and the default config file has the following statements:

{
'node_label': 'Node',
'relationship_type': 'CONNECTED',
'identifier_property': 'id'
}

You can easily change this dictionary and create an instance with new modifications. For example:

config = {
'node_label': 'Person',
'relationship_type': 'LOVES',
'identifier_property': 'name'
}
G = nx.Graph(driver, config=config)

You can also change the default values after the instance creation:

G.direction = 'UNDIRECTED' #for Undirected Graph
G.direction = 'NATURAL' #for Directed Graph
G.identifier_property = ‘name’
G.relationship_type = ‘LOVES’
G.node_label = ‘Person’

To check the config file:

>>> G.base_params()
{'direction': 'NATURAL',
'node_label': 'Person',
'relationship_type': 'LOVES',
'identifier_property': 'name'}

Built-in Data Sets

nxneo4j has 3 built-in datasets:

  • Game of Thrones
  • Twitter
  • Europe Road

1.Game of Thrones data

Created by Andrew Beveridge, the data set contains the interactions between the characters across the first 7 seasons of the popular TV show.

There are 796 nodes and 3,796 relationships.

All nodes are the TV characters labeled “Character”. The relationship types are “INTERACTS1”, “INTERACTS2”, “INTERACTS3” and “INTERACTS45”

The only node property is “name”

The relationship properties are “book” and “weight”.

You can load it with the following command:

G.load_got()

2. Europe Roads

Created by Lasse Westh-Nielsen, the data set contains the European cities and the distances between them.

There are 894 nodes and 2,499 relationships.

All nodes are labeled “Place” and the relationships types are all “EROAD”

Node properties are “name” and “countryCode”

Relationship properties are “distance”, “road_number” and “watercrossing”.

You can load the data with the following code:

G.load_euroads()

3. Twitter

Created by Mark Needham, the data contains Twitter followers of the graph community.

There are 6526 nodes and 177,708 relationships.

All node labels are “User” and all relationship types are “FOLLOWS”

Node properties are “name”, “followers”, “bio”, “id”, “username”, “following”. Relationships don’t have any property. To get the data, run:

G.load_twitter() 

Graph Data Science

It is algorithm time!

There are at least 47 builtin graph algorithms in Neo4j. nxneo4j will expand to cover all of them in the future versions. For now, the following networkX algorithms are supported:

  • pagerank
  • betweenness_centrality
  • closeness_centrality
  • label_propagation
  • connected_components
  • clustering
  • triangles
  • shortest_path
  • shortest_weighted_path

Let’s clear the data and load Game of Thrones data set:

G.delete_all()
G.load_got()

Visual inspection:

nx.draw(G) # You can zoom in and interact with the nodes
# when running on Jupyter Notebook
Image by the Author
  1. Centrality Algorithms:

Centrality algorithms help us understand the individual importance of each node.

>>> nx.pagerank(G)
{'Addam-Marbrand': 0.3433842763728652,
'Aegon-Frey-(son-of-Stevron)': 0.15000000000000002,
'Aegon-I-Targaryen': 0.3708563211936468,
'Aegon-Targaryen-(son-of-Rhaegar)': 0.15000000000000002,
'Aegon-V-Targaryen': 0.15000000000000002,
'Aemon-Targaryen-(Dragonknight)': 0.15000000000000002,
'Aemon-Targaryen-(Maester-Aemon)': 1.1486743815905878,
... }
>>> nx.betweenness_centrality(G)
{'Addam-Marbrand': 0.0,
'Aegon-Frey-(son-of-Stevron)': 0.0,
'Aegon-I-Targaryen': 0.0,
'Aegon-Targaryen-(son-of-Rhaegar)': 0.0,
'Aegon-V-Targaryen': 0.0,
'Aemon-Targaryen-(Dragonknight)': 0.0,
'Aemon-Targaryen-(Maester-Aemon)': 186.58333333333334,
... }
>>> nx.closeness_centrality(G)
{'Addam-Marbrand': 0.3234782608695652,
'Aegon-Frey-(son-of-Stevron)': 0.0,
'Aegon-I-Targaryen': 0.3765182186234818,
'Aegon-Targaryen-(son-of-Rhaegar)': 0.0,
'Aegon-V-Targaryen': 0.0,
'Aemon-Targaryen-(Dragonknight)': 0.0,
'Aemon-Targaryen-(Maester-Aemon)': 0.33695652173913043,
... }

2. Community Detection Algorithms

Community Detection algorithms show how nodes are clustered or partitioned.

>>> list(nx.label_propagation_communities(G))
[{'Addam-Marbrand',
'Aegon-I-Targaryen',
'Aerys-II-Targaryen',
'Alyn',
'Arthur-Dayne',
... ]
>>> list(nx.connected_components(G))
[{'Raymun-Redbeard'},
{'Hugh-Hungerford'},
{'Lucifer-Long'},
{'Torghen-Flint'},
{'Godric-Borrell'},
... ]
>>> nx.number_connected_components(G)
610
>>> nx.clustering(G)
{'Colemon': 1.0,
'Desmond': 1.0,
'High-Septon-(fat_one)': 1.0,
'Hodor': 1.0,
'Hosteen-Frey': 1.0,
... }
>>> nx.triangles(G)
{'Addam-Marbrand': 0,
'Aegon-Frey-(son-of-Stevron)': 0,
'Aegon-I-Targaryen': 0,
'Aegon-Targaryen-(son-of-Rhaegar)': 0,
'Aegon-V-Targaryen': 0,
... }

3. Path Finding Algorithms

Path Finding algorithms show the shortest path between two or more nodes.

>>> nx.shortest_path(G, source="Tyrion-Lannister", target="Hodor")
['Tyrion-Lannister', 'Luwin', 'Hodor']
>>> nx.shortest_weighted_path(G, source="Tyrion-Lannister", target="Hodor",weight='weight')
['Tyrion-Lannister', 'Theon-Greyjoy', 'Wyman-Manderly', 'Hodor']

Resources:

Project Github Page:

Jupyter Notebooks:

https://github.com/ybaktir/networkx-neo4j/blob/master/examples/nxneo4j_tutorial_latest.ipynb

Changelog:

https://github.com/ybaktir/networkx-neo4j/blob/master/CHANGELOG.md

Credits:

Mark Needham for creating the library.

David Jablonski for adding the functionalities while improving the core functionality.

--

--