How good is Open Data?

Mykola Kozyr
Aspectum
Published in
4 min readMar 27, 2019

Episode 1. OpenStreetMap Nominatim Geocoder

Developing Aspectum as a tool for not only GIS-professionals it was a crucial task to adopt services transforming geospatial data in the text format to geometry. The idea supporting this is to keep users’ psyche safe by not letting them work with the Shapefile only. Supporting CSV, XLS, XLSX formats with geospatial data as latitude an longitude were not enough since it is still a common way to store geospatial data as addresses. So, we were ready to go deep into Geocoding.

There are plenty of geocoding services. As a first iteration, we have decided to try Nominatim — the geocoding tool based on OpenStreetMap data. While developing the technical description of the task, the first question appeared was the following:

Could Nominatim be used for all the countries?

I could not find such information in the documentation, as well as on GIS StackExchange (spoilers alert). In addition to the coverage question, we have wanted to evaluate the quality of geocoding by accuracy, precision, preferable input data formats etc.

In this article, I would mention Nominatim data coverage, Reverse vs Direct Geocoding, and additional features appeared during this small research.

No OpenStreetMap was harmed in the making of this research. Everything run on the OSM replication.

Coverage

It seems to be a simple task — check if the geocoder works for all the countries: find some sample of addresses for all the countries and run the geocoder. Okay, it seems easy. Let’s just find the data. Believe me or not — it is not what you can find by just googling it. I was sure GIS-geeks met similar tasks, so I went to Twitter, asked the community, and got the brilliant answer:

That’s easy! Next steps were simple: Natural Earth (I’m still mad at you, guys), QGIS (random 100 points per country), geopy. Now I was ready to answer the original question. Running direct geocoding from addresses dataset showed Nominatim is available for all the countries.

Direct Geocoding & Countries Data. Screenshot from Aspectum workspace

Precision

During the previous task I have spotted an interesting thing — reverse geocoding returns not only the address but also latitude and longitude:

{  
"type":"FeatureCollection",
"licence":"Data © OpenStreetMap contributors, ODbL 1.0. https://osm.org/copyright",
"features":[
{
"type":"Feature",
"properties":{
"place_id":"111094940",
"osm_type":"way",
"osm_id":"154201801",
"place_rank":"26",
"category":"highway",
"type":"unclassified",
"importance":"0.1",
"addresstype":"road",
"name":null,
"display_name":"Palianychyntsi, Fastiv Raion, Kyiv Oblast, 08542, Ukraine",
"address":{
"village":"Palianychyntsi",
"county":"Fastiv Raion",
"state":"Kyiv Oblast",
"postcode":"08542",
"country":"Ukraine",
"country_code":"ua"
}

},
"bbox":[
29.9982289,
49.99599,
30.004028,
50.0029167
],
"geometry":{
"type":"Point",
"coordinates":[
30.000565205347,
50.0005604121134
]

}
}
]
}

I made an assumption this is the coordinates of the address returned, but I have noticed a thing — the point with address “AAT, Antarctica” has been returned from Reverse Geocoding, but could not be recognized by direct geocoding.

To check how different results are, I have connected coordinates from reverse geocoding with coordinates from direct geocoding. Results were absolutely various:

Okay, it becomes more and more intriguing. Do reverse or direct geocoding return inappropriate results? Hold on, what if we go further, and run reverse geocoding based on coordinates returned from direct geocoding based on addresses returned from reverse geocoding based on random coordinates! It looks fun, isn’t it?

You may explore the map and find out that Direct Geocoding results are usually more precise than the ones taken from Reverse Geocoding. Does it look like a scientific breakthrough? Not sure, but it looks nice, isn’t it?

For me personally, the most interesting things on the map became “spider webs” usually showing the nearest address from the random coordinates. That was a little spoiler for the next episode of “How good is Open Data”?

--

--

Mykola Kozyr
Aspectum

Product Management and Geospatial Innovations.