Working with Graph Databases at Geoblink

The world is a graph

Nowadays a lot of companies choose graph databases to save a lot of information, but what kind of information?

Graph databases are great to save relationships, and they are very fast at calculating how the different elements inside are related. A very good example could be social networks, or a family structure. In these cases we have people as “nodes”, and how the people are related as “relationships”. So storing this in a graph database is easy, right?

When we work with tons of information, the first step is to make a decision about which graph database to use. There are a lot, but one of the most popular ones is Neo4j. With Neo4j we are able to build a big data system because we can build clusters with all our information, and the relationship’s structure.

The skeleton of a graph database are nodes and relationships, so the most important thing is to be very clear about how the information has to be saved. We can save many types of nodes and the same with the relationships, so the type of nodes and types of relationships will be “labels”.

For each graph database element (nodes and relationships) we can save attributes, and elements with the same type could have different attributes, so in graph databases, each element is independent.

How can we get information from a graph database?. There are a lot of methods to get information like APIs, plugins, queries, etc. but in particular, Neo4j has a special language named Cypher to query the system.

If we want to get the best performance out of a graph database it is very important to use indexes on labels or element attributes. Indexes could be the key to reach a better performance to get our results.

One of the most important operations to make is calculating the shortest path from one node to another given node. Graph databases normally come with some algorithms implemented like Dijkstra.

We are often going to use two approaches to get data. The first one is querying the database with a filter (like attributes or labels), and the second one is running an algorithm like “shortest path” implemented in a plugin or something similar. In the end, the structure of the results will be the same, getting nodes and the relationship between nodes, so we know how our nodes are connected between them, and which kinds of relationships we have used.

A good example to understand the paragraph above could be, if we have a graph with the structure of the roads in a city, the nodes will be the intersections between streets and the relationships will be how they are related, with properties about the streets like speed, type of road, if it is paved, etc. So for the first case we can know how many intersections a street has, and for the second case, we could know the shortest path from a start point to an end point, knowing which streets we have traveled, and the attributes from these streets, and other parameters that we can figure out.

Finally, in some cases we could apply a particular method to get data from our graph, and build a particular structure to return the results — we can achieve this with plugins. Neo4j allows to build plugins in a lot of languages like Java, Nodejs, Javascript, etc. — this is a very useful feature. In fact in Geoblink we have a big graph database and we have implemented our algorithms into a home made plugin to pull the information we show to our clients, like list of roads in an influence area.

By Carlos Asuero