How to … automate distance and travel time calculations using R and OSRM

Nils Meyer
6 min readApr 19, 2020

--

Photo taken by Nils Meyer

From my own experience in Data Science, I have been observing massive interest in making use of spatial data (geographical data). Whether it is a visualization of a geographical shape in combination with certain (performance) indicators or simply calculations, based on spatial informations. The reason behind that is quite obvious, I guess: There might be very few things that communicate information more intuitively than a map, spatial data is a very informative ressource in many business cases, and besides that, any dashboard that visualizes data, is beautified by a map.

However, as it is the case with many topics in data science, when you are a beginner, you are often unable to see the wood for the trees. Some guiding input, when you are confronted with a certain class of problems, is always helpful. That is why i wrote this article. To give some guiding input in automating the calculation of distance and travel time, which is a relevant foundation for further analysis in many practical scenarios. Based on data from the Open Street Map Project (OSM), the Open Street Routing Machine and the language R, I will show how to setup a basic, but nevertheless complete, technology-infrastructure to automate the task of calculating distance and travel time.

Some background about OSM and OSRM

A Wiki all about the OSM-Project: https://wiki.openstreetmap.org/wiki/Main_Page

A place where you can download relevant data from the OSM-Project: https://download.geofabrik.de/

The Github-Place for the OSRM-Service and some documention: https://github.com/Project-OSRM/osrm-backend/wiki

Example Use Cases

Imagine you want to decide, where to build your next store (e.g. a supermarket) and obviously it is a highly relevant factor, how potential customer can reach a place. With the method that I am going to introduce, you will be able, to quickly ran simulations for different investment options, whether you presume that your customer will reach your store by car, by bicycle or simply by feet. And eventually find the optimal option in terms of reachability.

Setting Up the OSRM-Server

The osrm-backend is going to provide a http-server, against which our request from the script will run. First, we need to set this up. Since I am using windows, there are two ways to do this: you can build the routing machine from source or you can download ready to use binaries here: http://build.project-osrm.org/.

If you download the binaries, unzip the folder and copy the folder named „osrm_Release“ wherever you prefer to save it. Next create a new folder inside „osrm_Release“ and call it „data”. Now we need to download the data from the OSM-Project. For this example, I am using the full dataset for lower-saxony from the OSM-project by using the page of „geofabrik“ Imentioned at the beginning. Copy the file (it ends with „.osm.pbf“) into the folder „data“. Next open your windows command prompt and go to the path where you saved the „osrm_Release“. Now we can run the routing machine by typing the following commands:

osrm-extract data/niedersachsen-latest.osm.pbf -p profiles/car.lua
osrm-contract data/niedersachsen-latest.osrm

The argument „-p profiles/car.lua“ refers to the driving profile, the engine uses for calculation. These informations are written in a script, that is written in the language LUA. For simplicity I used the given standard profile here, but that file can be easily changed or a new one can be created. For example there is a maximum-speed defined, that cars may drive on different kind of roads. Here is an example from the „car.lua“ file:

Snippet from the default car-profile in OSRM

If this process is done you will find several new files in the data-folder, including the one we are now interested in: „niedersachsen-latest.osrm“. The OSRM basically offers two algorithms, that are very common in solving problems like finding the shortest path between two points. The one is called Muli-Level-Dijkstra and the other one is called Contraction-Hierarchies. If you are interested in finding out more on these, here are some links:

https://en.wikipedia.org/wiki/Dijkstra%27s_algorithm

https://en.wikipedia.org/wiki/Contraction_hierarchies

Depending on the size of the data and your hardware this process will take only a very few minutes up to half an our.

As soon as you are done, you can start the engine by typing the following command:

osrm-routed data/niedersachsen-latest.osrm

In your prompt, you should get a message „[Info] running and waiting for requests“. Now we are ready to go.

Let´s switch to Rstudio, to write a script that will send the request to this server.

R-Code

It takes less than 40 rows of code to solve the problem.

First we install and load the R-package „httr“, which enables us to send requests against a http-server.

Next, we import the data, that contains the latitude and longitude of the place.

The following table contains the sample data that I used here. The format of the table is necessary for the function I intend to build.

The function “calculateGeo()” builds a http request with the data from the table. The format of the url must be in a certain order, i.e. the latitute and the longitude must all be in the right place to get the right results. The default term „127.0.0.1:5000“ is simply the address of the local host. It can optionally be changed. Further options which can be changed for the function concerns, whether the steps of the calculated way are returned from the server (the default ist „false“ here, cause we dont need that information in this case). Another option is the unit for the distance and the travel time. The default I set is kilometer („km“) and minutes („min“).

We do the automated calculations by using a for loop, in which the function from above is integrated. The calculated distance and duration are directly written to the table.

Finally, I simplified the calculated values in the table and changed the column-names for better orientation.

This is the result we get:

Conclusion

We are finished. We automated distance and travel time calculations based on the language R, the free and open-source tool OSRM and some data we downloaded from the OSM-project. I only used a sample of the three datasets in this example but obviously this can be expanded to millions of calculations without altering the code.

What excites me about this, is the simplicity and the costs of it, namely none. It just takes some of your time and of course some curiosity if you are willing to expand the solutions to different problems inclunding distance and travel time calculations.

Feedback

I am always excited to learn and get feedback from others. So if you like the article (or if you don´t like it at all), I encourage you to e-mail me at meyer.nils90@gmail.com or leave a comment on the post if you have any questions.

--

--

Nils Meyer

Data Scientist with experience in Health Care and Market Research for Consumer Electronics. Passionate about photography.