Introduction to Network Analysis with Open Street Maps

Rakesh Devalapally
Gaussian Solutions
Published in
5 min readAug 4, 2020

I take up freelance projects from time to time and in the recent past, I was involved in projects related to OpenStreetMaps and geoJSON files. In one of the projects, I had a requirement to find out whether, given a set of coordinates (hereafter called nodes) representing line segments (hereafter called ways), if all the ways intersect with at least another way or not. Two ways are said to be intersecting if they have a common node between them.

So, if a way is denoted by [(x1,y1), (x2,y2), (x3,y3)] and another way is denoted by [(x4,y4), (x5,y5), (x3,y3)], they are said to be intersecting at node (x3,y3).

While the objective of the task is to find out whether ways are intersecting or not, this post is about the tools that can be used to solve such tasks. The actual “how” comes in a latter post.

Why am I doing this?

Let’s assume the following section of a city, where the area between blue lines is a street designated for use by vehicles, and the green lines represent pedestrian paths. The freestyle drawing between the paths represents a “crossing” where pedestrians are allowed to cross a street.

My own representation of some blocks of a city

Now, in the above scenario, if a pedestrian wishes to go from Block 1 to Block 4, that pedestrian should walk all the way to Block 3, cross the road to Block 6, and walk back till Block 4. However, if there was a crossing between Block 1 and Block 4, he or she could have taken it directly, which would have saved time.

All of this data (information about streets, ways, crossing, pedestrian paths, cycling paths, points of interest) is available in OpenStreetMaps. In the above example, all the blue lines, green lines, and crossings are considered “ways”. So, we would say that there is an intersection between

  1. the pedestrian path at Block 1 with the crossing between Block 1 and Block 2.
  2. the crossing between Block2 and Block 3 with Block 3

and so on.

Let us assume that there is no crossing between Block 3 and Block 6. This leaves us with two “islands” (enclosed in red) that are not connected to each other by a pedestrian path.

Blocks 1,2 and 3 make up one island while the rest make up another.

And it makes it difficult for a person to go from one island to the other.

This is a classic scenario in any city planning. How can the city planners ensure that all such “islands” are connected? How to connect the whole city such that there is a free flow of movement (be it automobiles or pedestrians) from one part of the city to any part of the city? Also, in case they have to plan maintenance work on some street, what are the ways that are affected by it?

The answer lies in treating this as a network/graph traversal problem. Almost all of the cities routing networks are treated as graphs and analysis is performed on top of that graph, algorithms written to find out the shortest path between two points as well as to check whether two nodes (places) are connected or not.

I figured out posing this problem as a graph problem would be one option and then doing network analysis on top of it. I also had my intuition confirmed from what I read on open street map forums.

“OpenStreetMap uses graph structures as its core elements of mapping. Pedestrian networks are graphical structures: they define not only the shape and properties of pedestrian spaces but also how they connect to one another. “

However, the last time I studied graphs was over 15 years ago and I needed a refresher on it. Luckily, just after a couple of clicks on google, I came across this post from Analytics Vidhya, which served as a good starting point to brush up my concepts.

Next up was to check out the available tools for performing network analysis and here is what I found:

  1. Gephi: This is open-source software that can be used to analyze large scale networks and also to visualize them. It claims to have all the functions that are discussed on the Analytics Vidhya blog post and their website has a list of applications where the software can be used. There is also a lot of training material available in the form of tutorials using which users can learn how to master the tool. However, I did not choose this tool because it looked like a stand-alone tool that cannot be integrated into a pipeline later.
  2. Graph-tool: This one boasts of “making extensive use of template metaprogramming, based heavily on the Boost Graph Library” as all the core algorithms and data structures are implemented in C++, even though this is available as a python module. The documentation looks great and I would have chosen it if not for the disclaimer in their windows installation instructions. Fully native windows installation is not yet supported. One has to use a docker or use the Ubuntu space for windows users to get this up and running. I ignored it for the time being as I failed miserably once trying to setup docker on my windows machine. Also, there were not many hits on google using this tool.
  3. NetworKit: According to their notes, “NetworKit is a growing open-source toolkit for large-scale network analysis. Its aim is to provide tools for the analysis of large networks in the size range from thousands to billions of edges”. It is a python module and algorithms are written in C++, just like in Graph-tool. The documentation is well structured and they also have a list of datasets and publications that used this package.
  4. NetWulf: Interactive network visualization tool in python. The homepage had some amazing visualizations and that caught my eye. However, after reading a little, I saw that this is built on top of NetworkX. So, I had to check out NetworkX.
  5. NetworkX: It is a python package for “the creation, manipulation, and study of the structure, dynamics, and functions of complex networks.” The documentation is well written and the tutorials provided gave me what I was looking for and I chose to use NetworkX because it was easy to install and use. I prepared some random data manually and checked out whether the functionality provided by NetworkX helps me in solving the task, and boy it did. It was an easy task that took me only 10 minutes to prepare the data of about 10 nodes and 15 edges, build a graph from this information, and visualizing it.

Note: The visualization provided by Networkx is good for about 30 nodes. After this, the graph looks messy and better visualization can be achieved by using matplotlib package. Or, even better graph-tool?

In my next article, I will write more about the data I dealt with and how I used different packages to solve the issue of finding out “intersecting ways” and at which node they intersect. Please stay tuned…

--

--