#6 Data Science 👩💻 | Getting started with Neo4j and Gephi Tool
Neo4j Tool
Neo4j stores and manages data in its more natural, connected state, maintaining data relationships that deliver lightning-fast queries, deeper context for analytics, and a pain-free modifiable data model.
In a simple word, Neo4j is the MySQL of the graph databases. It provides a graph database management system, a language to query the database, a.k.a CYPHER, and a visual interface with the neo4j browser.
Let’s start the demo,
- Download neo4j Desktop, and install it
- After the installation,
For the example I am running hello world query which will create the 2 nodes called Neo4j and Hello world and 1 relation called says.
You can see that the 2 nodes is created and one relation called says is created using the query.
In the below image you can see the table view of nodes and relations.
I am using example project for the demo. Start the Movies database and see the database in the Neo4j browser.
After that load the movie database to the neo4j and it will show the data in graph format.
In this database,There are 9 person nodes and 8 movies nodes and total 18 relationships between nodes.
count the total node using query,
By using different query we can find the appropriate information like how many labels is there.
Using this query we can know that how many type of relationship is there in database
By using this query we can know that how the person is connected to the movie,who is producer of movie, which role person acted in the movie.
Find movies released in the 1990s…
// query for the movies released in 1990s..MATCH (nineties:Movie) WHERE nineties.released >= 1990 AND nineties.released < 2000 RETURN nineties.title
Here the list of movie released in 1990s,
List all Tom Hanks movies,
//query for list all tom hanks movieMATCH (tom:Person {name: "Tom Hanks"})-[:ACTED_IN]->(tomHanksMovies) RETURN tom,tomHanksMovies
Who directed “Cloud Atlas”?
query for this is,MATCH (cloudAtlas {title: "Cloud Atlas"})<-[:DIRECTED]-(directors) RETURN directors.name
Gephi Tool
Gephi is an open-source network analysis and visualization software package. It is mainly used for visualizing, manipulating, and exploring networks and graphs from raw edge and node graph data. It is an excellent tool for data analysts and data science enthusiasts to explore and understand graphs.
In this demo I have chosen a simple karate.gml dataset and performed some basic gephi operations on it. So lets get started.
- Open Gephi and click on New Project. Then choose File->Open and load the dataset of your choice as shown below. On loading the dataset it would show the number of nodes and edges present in the dataset as well as the type of the graph.
2. Below is how all the nodes and edges are displayed when initially dat is loaded.
3. Now we can represent the data in various layout. In he left pane choose the layout option and choose the layout of your choice and click on Run. In the below image I have chosen the ForceAtlas layout which displays the data in the following form.
4. Next we can differentiate the nodes based on various ranking like there In-Degree, Out-Degree or Degree and show them in different color. For this in the left pane on top side choose Nodes->Ranking there choose the ranking like in below image In-Degree is chosen, where red color nodes have lower in-degree compared to white and Dark grey node has highest in-degree rankings.
5. More clear visualizations can also be made by displaying the nodes in various sizes. For instance in the below image nodes having higher degree are larger in size compared to nodes having less degree i.e nodes in Dark grey have high value of degree compared to nodes in white and red color.
For displaying in various size in left pane in Appearance section select the Size option and then mention minimum and maximum size of nodes you want to display. I have given the Min size to be 10 and Max size to be 30.
6. Next we generate a Degree Distribution graph for Degree, In-Degree and Out-Degree and also get the Average Degree value for all the nodes. To generate the graph simply in the right pane choose Statistics tab and there run Average Degree in the Network Overview section.
A report will be generated as well the column for degree will be added to the dataset table.
To see the Data Table in the top Menu Bar select Window->Data Table and you would be able to see your table like as in above image where after running the Average Degree function columns for In-degree, Out-Degree and Degree is added for each node present.
8. Now we can try and different functionalities as well as try various layouts in the Gephi tool. In the below image I have used the Noverlap Layout.
That’s all for this introduction to Neo4j tool. You can easily visualize ll the info in this tool. Hope you get what you want.
Thank You!!