Have you ever wondered what a graph of all the roads in the world would look like? What kind of amazing insights one would be able to uncover if such a graph would be put in a Neo4j database? Amazing information can also be unveiled if one could extract the most connected cities in Europe, and it would definitely come in handy for a vacation planning platform.
Atlas was built to do all of the above, and more. The library can selectively import entities from Open Street Map, write them to a Neo4j database as such, or perform summaries of the data before commiting. The resulting graph can be simplified in a number of ways — such as contracting roads, smart linking and removing entities based on a plethora of criterias.
Neo4j’s capabilities in terms of speed and ease of development are quite outstanding. The elegance and flexibility of Cypher, along with the newly capabilities of APOC begged for a system to be put together, that answers high-level questions about the world.
** We interrupt this program for an announcement: i am back to graph database contracting. Grab the offer while it’s hot: https://bit.ly/2WD1hjc **
Atlas: Answering Questions about the World
Atlas was born mainly out of curiosity, but a lot of business use cases exist. The goal was to answer both simple questions, and perform broader, more complex analytics on data about the world.
How many green spaces does London have, compared to Berlin?
How many cities close to the sea, are there in Germany? How many mountain towns in Switzerland are directly connected to the capital, with a road no longer than 200km?
For questions such as these, data could be supplied by Open Street Map.
Given other data sources, such as World Bank, the system would be able to answer other classes of queries.
What are all the seaside cities that are hosting an outdoor rock concert next week(events data)?
What are some mountain towns with precipitation levels above X that have a GDP per capita over Y(economic indexes and weather data)?
Where do most Germans spend their holidays, and what type of resort do they prefer?
What are the most connected cities in Western Europe, by car, train, or airplane?
What regions of a country have people who are least likely to have met?
The target was set for some network analytics questions as well.
Given just the topology of the roads, how likely is it that a street will be crowded(betweenness centrality)?
What 4,6, 10 towns enable one to move from every one town to every other town?
How many cars can pass between two points(max flow)? What is the minimum number of roads that can disconnect 2 cities(min cut)?
The list can go on and on, as the number of questions one can come up with are almost infinite.
After a couple of experiments with existing Geo systems it became apparent that both importing and representing Geo data, to start with, could be improved. Importing selectively the data from Open Street Map to Neo4j was not implemented by any projects. Queries on existing solutions using Open Street Map would be too slow for this type of questions outlined above because of the sheer size of the data.
With this in mind, I built Atlas, aiming to answer the questions above. You can read more about the specific technical details in the Readme of the repository on GitHub. Some top-level categories of what it can do:
- GIS data abstractions and operations
- Import Open Street Map data
- Import from other data sources(World Bank, NASA, etc)
- Data transformers(explained below)
At it’s core, Atlas is a smart GIS system, so central to the data model are data about everything that one would usually find on a map. Here is a non-exhaustive list of what one could find in this dataset. Open Street Map knows about 3 types of entities: Node, Way, Relation and they defer to each other in that order.
Nodes are the basic data structure; it’s a geo point, with latitude and longitude that can have tags which describe what it represents. Ways are a way of connecting Nodes together, that describe a bigger entity, such as the perimeter of a school or a highway. Relations group together very large entities, and are comprised of Nodes and Ways. You can read more here.
Atlas organizes all of this in a graph, bringing to this wealth of data all the querying and analytics capabilities that Neo4j has.
Putting it all in a graph
Atlas can import Open Street Map data selectively. You can filter the data based on type of entity(node, way, relation) or tags associated with it. All data output is a Java 8 Stream, for efficient memory use. You decide what happens with it. Do you write it to the database on the spot? Do you collect and aggregate, then write to Neo4j?
Note: if you need to do more complex filtering on your entities in memory, i recommend this project: https://github.com/osmlab/atlas . You will find another sister project from Osmlab that can even use Apache Spark for this filtering/preprocessing step.
Aggregating Information on a Path
The time a query takes is proportional to the number of nodes and relationships present on the path/matching query. Sometimes, the user might not care about all points on a road, just about a summary of the data.
When doing network topology analysis on roads, all we care about is how the road looks(where the intersection points are), and what the length of each road segment is. Atlas can define such a contract criteria(see below) for highways, national roads or all types of roads.
By intelligently manipulating the graph, we can make it easier for Cypher to find the answers we seek. This can make specific classes of queries blazing fast.
Here is a very simple example of what’s going on. Let’s assume we have a way with 10 nodes, no intersections. There is nothing interesting happening between start and end node, so we can contract (collapse) it.
Atlas can traverse roads, and if no interesting entity(such as a gas station) has been smart-linked(more on this in a separate article) to a road node, or if the node brings no additional value, such as being part of an intersection, the nodes are marked for contraction and later collapsed. Here is the above graph after the contraction. Only the start of the way and the end node are kept.
Atlas extracted some ways(roads) from the UK, marked as highways in OpenStreetMap. This is a very simple example, where no 2 node points are shared by a road(way). There are no intersections in this case.
Here is how the graph looks after contracting it.
Summarizing Points of Interest alongside Roads
Let’s talk about another example. An app might want to display the number of gas stations, museums or lakes(or all of them together) on alternate paths to destination. In this case, all the data needed for this query is road length and the number of interesting entities on each road segment.
Atlas can smart link those entities to road points. When contracting the road points, the data about the connected gas stations is encoded in the relationship.
Alternatively, the road point that is closest to the interesting entity can be swapped with the interesting entity point. This is a design decision, and each choice has trade offs.
Business use cases
Here is a couple of examples of how Atlas could be used in a live product.
- You have a large inventory left over, of high-end priced, big sized, professional mountain winter shoes.
Atlas can give you a list of all location where:
- the GDP per capita is 2,3 or 4 standard deviations away from the mean.
- the height average is high.
- the elevation is high OR the town is within reasonable driving distance of such a site.
2. You are selling party gear — hats, whistles and many others. Atlas knows where concerts with 200 to 2 million people are being held, and when.
3. You are selling lugagges and travel accessories.
Atlas can give you a list of all locations where:
- a large influx of people are going to leaving by way of plane/train/bus to another destination in the next week.
- the GDP per capita is in sync with the pricing of your travel bag.
4. You are selling low-end priced water sports accessories(kayaks, paddle boards). You can query Atlas to obtain all cities close to water surfaces that are fit for your accessories and where the GDP per capita correlates with the price of the accessories.
What roads should not be worked on at the same time, because they would cause major disruption?
What would happen to the flow of the network, if some of the streets would be made one-way?
What cities’/towns’ network of roads are topologically similar to the town where i run my business? To what town should i expand my coffee shop franchise, and where should the next shop be located?
In this article, i have introduced Atlas, a new Open Source project built on top of Neo4j and OpenStreetMap, which makes querying and analyzing data about the world easy. We looked at some example queries, technical details and talked about some business use cases.
If this caught your attention, be sure to follow Neo4j or me! I plan a second article in which to present some interesting queries on real-world data, and a third one in which i will be applying some network analytics on roads.