Data Science : Use of Neo4j and Gephi Tool for Graphical Analysis of data

Yash Alpeshbhai Patel
6 min readSep 24, 2021

--

Credit

Neo4j Tool

Neo4j provides the most trusted and powerful tools for developers and data scientists to swiftly build today’s intelligent applications and machine learning workflows. It’s available as either a fully managed cloud service or as a self-hosted solution.
It’s a non-relational database. It is schema-free and very scalable. It is the most widely used graph database management system in the world. It is written in Java and can be accessed by other languages via a transactional HTTP endpoint utilising Cypher Query Language (CQL).

Some of the following particular features make Neo4j very popular

  • Cypher
  • Constant time traversals in big graphs for both depth and breadth
  • Flexible property
  • Drivers for popular programming languages, including Java, JavaScript, .NET, Python, and many more.

For more details refer this

Features

  1. It is based on the Property Graph Data Model.
  2. It uses Apache Licence to support indexes.
  3. Full ACID (Atomicity, Consistency, Isolation, and Durability) rules are supported.
  4. It allows you to export query data in JSON and XLS formats.
  5. Many more features and advantages.(refer this)

Graph Database

A graph database is a single-purpose, specialised platform for constructing and manipulating graphs. Graphs are made up of nodes, edges, and attributes, which are all utilised to represent and store data in a way that relational databases can’t.

Main Components:

  1. Nodes
  2. Edges(relations)
  3. Attributes

We’ll learn various queries and gain a better knowledge of GraphDB now.

Creating a new node and adding label to an existing node.

CREATE(person{name:"Yash",age:21,department:"IT"}) RETURN person
MATCH (person{name:"Yash"}) SET person:yash RETURN person

Adding more properties to an existing node

MATCH (person{name:"Yash"}) SET person.date="5/09/2021" RETURN person

Creating a relationship between two nodes

CREATE (person:yash{name:"Yash"})-[r:study]->(sub:subject{name:"java"}) RETURN person,sub,r

Deleting a relationship between two nodes

MATCH(person:TestPerson)-[r:study]->(sub:subject) DELETE r RETURN person,sub

The study associations between all nodes of the TestPerson label and the Food label will be deleted using the above operation.
You should first filter out any relationships between individual nodes before deleting them.

Delete a whole node

MATCH (n:yash) DELETE n

In the example below, I constructed a simple Neo4j project using the Movies dataset provided by Neo4j and ran several queries to view the data. The following are the results of the various queries:

  1. Let’s see if we can find any films that were released after the year 2000.

Query:

MATCH (m:Movie) where m.released > 2000 RETURN m

We may now limit the number of movies to only n numbers. So, if you’re looking for movies released after 2000, the maximum is five.

MATCH (m:Movie) where m.released > 2000 RETURN m LIMIT 5

2. The following query returns the name of the person, director, and movie title that were published after 2005, up to a maximum of 5, and depicts the relationship between the nodes in graphical form using edges..

MATCH (p:Person)-[d:DIRECTED]-(m:Movie) where m.released > 2005 RETURN p,d,m limit 5

Also from left panel we can select Table view to explore more.

3. If we want to see a list of all the people or movies in the database, we can perform the following query, which returns a list of people or movies.

MATCH (m:Movie) RETURN m

OR

MATCH (p:Person) RETURN p
Person
Movies

5. We may also use a query to find movies released between two dates, such as the example below, which lists movies released between the years 2000 and 2010.

MATCH (m:Movie) where m.released >= 2000 and m.released<=2010 RETURN m

Gephi Tool

Gephi is a network analysis and visualisation software suite that is free to use. Gephi is a graph exploration and understanding tool for data analysts and scientists. The user interacts with the representation, manipulating the structures, forms, and colours to discover hidden patterns for graph data.

Features (Source):

  • Real-time Visualization.
  • Layout Algorithm for shaping.
  • Metrics for Analysis
  • Networks over Time
  • Create Cartography
  • Dynamic filtering
  • Data Laboratory

For the demo purpose i have choose a simple Les Miserables.gexf dataset which is available as sample dataset.

  1. Open Gephi and click on New Project. Then choose File->Open and load the dataset of your choice or Sample datasets give by tool. On loading the dataset it would show the number of nodes and edges present in the dataset.

2. After Clicking on OK it will display all the nodes and edges present in the dataset.

3. To display or to represent data in various Layout to shape the graph, we can use layout algorithm feature on left side panel. From that we can choose many different layout, after clicking one of them click on Run. After clicking on run graph reformat to layout chosen. I n below image i had reformat the graph to fruchterman reingold layout.

4. The nodes can then be coloured differently depending on their ranking, such as their In-Degree, Out-Degree, or Degree. Choose Nodes->Ranking from the left pane on the top side, then Degree from the drop-down menu.

5. Select the Size option in the left pane’s Appearance section to display nodes in various sizes. Then specify the minimum and maximum size of nodes you want to display. I’ve set the minimum size to 20and the maximum size at 50.

6. To run the Average Degree report, go to the right panel, select the statistics tab, and run it .we can see the average degree distribution, as well as the in-degree and out-degree distributions. Degree Distribution is displayed below.

7. we can also display the labels of the node. Choose the Show Labels icon present in the bottom bar below the graph.

Thank You for watching and reading.

--

--