Exploring Neo4j

Sze Zhong LIM
Data And Beyond
Published in
8 min readJun 16, 2024

This is in reference to the course by University of California San Diego Graph Analytics for Big Data.

In the course, they are using the Neo4j community version which is downloaded to the desktop. In my own version, i will be trying out the Neo4j Aura, which is a cloud-based graph database system.

When you first go into the Neo4j website, click on Get Started Free on the top right corner.

After that, you will be on the landing page for Neo4j AuraDB. Once again, click the Start Free button in the middle of the page.

Register an account using your email. I registered my account and you will be able to create an instance. For those who are not familiar with the cloud concept, you can treat an instance as virtual machine or an online computer. You will be given a password when you create an instance. REMEMBER TO SAVE THAT PASSWORD!!!

Click on the “Open” button, and you will be led to another page where you will have to fill in your instance password.

If your keen, you can go thru the guides.

Hands On Coding with Cypher

We will be coding in Cypher. Cypher is a Programming Language. The intro from their official link can be found here.

Cypher is Neo4j’s graph query language that lets you retrieve data from the graph. It is like SQL for graphs, and was inspired by SQL so it lets you focus on what data you want out of the graph (not how to go get it). It is the easiest graph language to learn by far because of its similarity to other languages and intuitiveness.

Creating Nodes and Relationships

Go to the “Query” tab on the top, and paste the code below.

create (N1:ToyNode {name: 'Tom'}) - [:ToyRelation {relationship: 'knows'}] -> (N2:ToyNode {name: 'Harry'}), 

(N2) - [:ToyRelation {relationship: 'co-worker'}] -> (N3:ToyNode {name: 'Julian', job: 'plumber'}),

(N2) - [:ToyRelation {relationship: 'wife'}] -> (N4:ToyNode {name: 'Michele', job: 'accountant'}),

(N1) - [:ToyRelation {relationship: 'wife'}] -> (N5:ToyNode {name: 'Josephine', job: 'manager'}),

(N4) - [:ToyRelation {relationship: 'friend'}] -> (N5)

;

Feel free to save the cypher snippet. I saved mine in a folder called 20240427 under “Original”.

What the code does is basically create 5 nodes with 5 edges. Click the “Play” button at the top right corner, and you should see something like the below in your database tab.

Click on the purple colored ToyNode tab, and it will naturally call out a query that pulls the nodes out.

Click on the grey colored ToyRelation tab and you should see the relations being queried out.

Now. What happens if you accidentally click on the previously saved Cypher snippet this now that generates these 5 nodes? Below, I will just click on them twice to generate the nodes, and then query them out.

You can see that my Nodes have increased and I now have 15 nodes. When I query out the graph, we can see 3 similar graphs being shown. How then can we remove all the data and restart the database?

You may use the simple code below to reset your database. This is definitely not the best way to manage your database, but we will be learning about that much later.

MATCH (n)
DETACH DELETE n

Another alternative to clean the slate is to use the code below.

//One way to "clean the slate" in Neo4j before importing (run both lines):

match (a)-[r]->() delete a,r

match (a) delete a

Adding and Modifying a Graph

Create the graph again using the previously saved Cypher code. Add in a node incorrectly on purpose. You may use the below code. Pls do not copy the comments delimited by //

// To start creating a new network
create (N1:ToyNode {name: 'Tom'}) - [:ToyRelation {relationship: 'knows'}] -> (N2:ToyNode {name: 'Harry'}),

(N2) - [:ToyRelation {relationship: 'co-worker'}] -> (N3:ToyNode {name: 'Julian', job: 'plumber'}),

(N2) - [:ToyRelation {relationship: 'wife'}] -> (N4:ToyNode {name: 'Michele', job: 'accountant'}),

(N1) - [:ToyRelation {relationship: 'wife'}] -> (N5:ToyNode {name: 'Josephine', job: 'manager'}),

(N4) - [:ToyRelation {relationship: 'friend'}] -> (N5)

;

// To add a node incorrectly
create (n:ToyNode {name:'Julian'})-[:ToyRelation {relationship: 'fiancee'}]->(m:ToyNode {name:'Joyce', job:'store clerk'})

// To delete the incorrect code
match (n:ToyNode {name:'Joyce'})-[r]-(m) delete n, r, m

It should look as below after adding the node incorrectly.

And after you delete the node, it should appear as below.

You may also modify a node’s information by using the code:

//Modify a Node’s Information. Input one by one.

match (n:ToyNode) where n.name = 'Harry' set n.job = 'drummer'

match (n:ToyNode) where n.name = 'Harry' set n.job = n.job + ['lead guitarist']

You can see that after inputting the job, you can see the node details when you click on the node. You can also add his job as a lead guitarist also.

In the next code, we first define Julian as n, and then create a new relationship and a new ToyNode called Joyce, with attributes of a name and a job.

//Adding a Node Correctly

match (n:ToyNode {name:'Julian'})

merge (n)-[:ToyRelation {relationship: 'fiancee'}]->(m:ToyNode {name:'Joyce', job:'store clerk'})

Uploading a CSV File into Neo4j

Now unlike using Neo4j Desktop on your local system, Neo4j Aura doesn’t allow direct csv uploads from your local system. However, it allows downloads from a file hosted on a web server. You may find the documentation of the Neo4j Aura on uploading csv files here.

I was using Google Drive as my main storage.

When sharing the Google Drive link, it will appear as below:

# When clicking on copy link
https://drive.google.com/file/d/1HlD-GFGZ4ekWUaT6uXXiRgMjvI3NEN4i/view?usp=sharing
https://drive.google.com/file/d/1HpSRUg0TKWEijkNFKUbBctloPSEQj9q6/view?usp=sharing
https://drive.google.com/file/d/1HkvFXvRg3yHa2CewX7SbQqpkiRZ8wd9A/view?usp=sharing

# The id of the file is: 1HlD-GFGZ4ekWUaT6uXXiRgMjvI3NEN4i
# The id of the file is: 1HpSRUg0TKWEijkNFKUbBctloPSEQj9q6
# The id of the file is: 1HkvFXvRg3yHa2CewX7SbQqpkiRZ8wd9A

# Modifying it to downloadable format
https://drive.usercontent.google.com/u/0/uc?id=1HlD-GFGZ4ekWUaT6uXXiRgMjvI3NEN4i&export=download
https://drive.usercontent.google.com/u/0/uc?id=1HpSRUg0TKWEijkNFKUbBctloPSEQj9q6&export=download
https://drive.usercontent.google.com/u/0/uc?id=1HkvFXvRg3yHa2CewX7SbQqpkiRZ8wd9A&export=download

On the Neo4j query, you may go for the below code:

LOAD CSV WITH HEADERS FROM "https://drive.usercontent.google.com/u/0/uc?id=1HlD-GFGZ4ekWUaT6uXXiRgMjvI3NEN4i&export=download" AS line
MERGE (n:NodeA {Name:line.Source})
MERGE (m:NodeA {Name:line.Target})
MERGE (n) -[:TO {dist:line.distance}]-> (m)

So the text file that I uploaded is a csv file as below:

Result of the csv file uploaded

Basic CYPHER Queries

We will use the sample of the text file above to do some basic queries. We can refer to the graph network below to validate whether the returned answers are valid or not.

Dataset 1 Sample
Dataset 2 Sample
Dataset 3 Sample

The queries done are compiled below:

============================Dataset 1==================================
//Counting the number of nodes
match (n:NodeA)
return count(n)


//Counting the number of edges
// We need to declare the nodes associated with the edges
match (n:NodeA)-[r]->()
return count(r)


//Finding leaf nodes:
// Leaf nodes are nodes which have no outgoing edges
// Return m returns the actual nodes.
match (n:NodeA)-[r:TO]->(m)
where not ((m)-->())
return m


//Finding root nodes:
// Root Nodes are nodes which have no incoming edges
match (m)-[r:TO]->(n:NodeA)
where not (()-->(m))
return m


//Finding triangles:
// Pattern matching query
// Consist of three nodes and three edges.
match (a)-[:TO]->(b)-[:TO]->(c)-[:TO]->(a)
return distinct a, b, c
// To show the relationships / edges as well.
match (a)-[r1:TO]->(b)-[r2:TO]->(c)-[r3:TO]->(a)
return distinct a, b, c, r1, r2, r3


//Finding 2nd neighbors of D:
// Means nodes that are 2 nodes away from D.
match (a)-[:TO*..2]-(b)
where a.Name='D'
return distinct a, b
// To show the relationship / edges as well.
match (a)-[r:TO*..2]-(b)
where a.Name='D'
return distinct a, b, r

//Finding the induced subgraph given a set of nodes:
match (n)-[r:TO]-(m)
where n.Name in ['A', 'B', 'C', 'D', 'E'] and m.Name in ['A', 'B', 'C', 'D', 'E']
return n, r, m

============================Dataset 2==================================
//Finding the types of a node:
match (n)
where n.Name = 'Afghanistan'
return labels(n)


//Finding the label of an edge:
match (n {Name: 'Afghanistan'})<-[r]-()
return distinct type(r)


//Finding all properties of a node:
match (n:Actor)
return * limit 20


============================Dataset 3==================================
//Finding loops:
match (n)-[r]->(n)
return n, r limit 10


//Finding multigraphs:
// Find nodes where there are different edges for the same pair of nodes.
match (n)-[r1]->(m), (n)-[r2]-(m)
where r1 <> r2
return n, r1, r2, m limit 10

We will go thru the queries individually to see their response on Neo4j

Count the number of nodes
Counting the Edges associated with the node types
Finding Leaf Nodes (Nodes with no Outgoing Edges)
Finding Roof Nodes (Nodes with no Incoming Edges)
Finding triangle pattern with 5 distinct nodes returned.
The triangle pattern consists of 5 nodes with D as the common shared point.

So one of the issues when querying for the nodes only is that they don’t come together with the relationships. To display the nodes AND the relationship, we have to change our query. We will define the relationship and return the relationship. Each edge / relationship has to be defined individually. If we use only one variable for all the edges, an error will occur.

To show triangle with edges and nodes
Finding 2nd Neighbors of D
Finding the induced subgraph given a set of nodes

Using another dataset, we will do the other queries.

Finding the Node Type
Finding the Edge Type
Finding Everything about Actor Nodes
Finding Loops
Finding Multigraphs.

--

--