Philosograph

Thaynara Santos
BTG Pactual Developers
4 min readJun 29, 2020

--

Building relationships with neo4j

The cover image was taken from the official website of neo4j =)

What is it?

Neo4j is the world’s leading Graph Database. It is a high performance graph store with all the features expected of a mature and robust database, like a friendly query language and ACID transactions. The programmer works with a flexible network structure of nodes and relationships rather than static tables. (GitHub)

Why we use?

The most relevant factor for using a graph-oriented database is the need to map relationships between entities.

In a graph model we have the nodes that represent the entities and the edges that represent the relationships, the edges can be directional or not, have weight or not, this contributes to the search in width or depth more optimally.

Some examples of relationships between entities that we use here at BTG Pactual are:

(entity) -> has -> (address)
(entity) -> is a holder -> (account)
(entity) -> parent -> (entity)
(entity) -> mother -> (entity)
(entity) -> works at -> (company)
(entity) -> is in -> (list)
(entity) -> involved in -> (negative news)

In this way, analysts are able to have a more objective view of the relationships between entities and what they are involved in, in addition to revealing insights into possible connections.

How we use?

About

Now the explanation about the title, as we cannot show our real data, I remembered a philosophy book that I had read, that in each chapter the book talks about a philosopher and mentions which philosophers are in the same line of thought, at the time I found that it would make a lot of sense one day to implement this in a graph database, so this will be the example of today.

For those interested, that is the book:

Environment

It is possible to install neo4j through docker.

To make a test on your local machine is very simple, you only need to execute the following command:

docker run --name testneo4j -p7474:7474 -d --env NEO4J_AUTH=neo4j/test neo4j:latest

And access: http://localhost:7474/browser

Home

The username and password used NEO4J_AUTH will be used to connect to the graphical interface as well.

Through this interface it’s already possible to execute all queries using the language Cypher.

In the documentation it’s possible to find examples with each type of query, as CREATE, DELETE, MATCH, RETURN and others.

Development

For this example I chose: Python language, pandas library to read the csv and the neomodel library to connect to neo4j and execute queries.

The code to transform the csv file into a graph database can be found here:

Executing the following query we can visualize all philosophers and their relationships with other philosophers who have the same line of thought.

MATCH (n) return n

And find all the relationships of a specific philosopher with this:

MATCH (a:Philosopher {name: 'Sam Harris'})-[r]-(b) RETURN r, a, b

In addition to being able to expand each node according to the visualization needs.

One way to check if the relationships in the books are coherent would be to analyze the similarity of the texts of each philosopher and a way to improve the visualization, by removing cyclical relationships, would be to register the relationship based on who came first.

Final Considerations

A graph-oriented database, such as neo4j, can provide a good overview of the connections between entities of your systems, avoiding extensive queries, with a lot of joins, that would be made in relational databases, and also ensures support for transactions and clusters.

It‘s also possible to use a hybrid approach, by leaving your data in a relational or document-oriented database and saving only the relationships in a graph-oriented database, this can help when fully migrating the database becomes impracticable, but you still need to optmize your searchs.

If you have any doubt, suggestion or advice feel free to contact us :)

--

--