OpenStreetMap Quality Measurement

Tianyi Ren
Unearth
Published in
6 min readNov 25, 2019

For the second straight year, Critigen’s Open Data & Development team was well represented at the SOTM US conference. In our OSM series, we’ll highlight each topic in detail, reveal our findings, and bring forward important questions to the OSM community.

Authors: Tianyi Ren, Monica Brandeis

As participation and coverage increases, data quality continues to be a key focus area within the OpenStreetMap(OSM) community. Both researchers and communities have adopted various approaches to monitor OSM’s road network data quality through aspects such as data accuracy, consistency, and completeness (Ather 2009). Rule-based validation is another critical approach that targets both data accuracy and consistency — ultimately aiming to validate data by flagging well-defined mapping errors.

There are various existing quality assurance tools for rule-based validation, such as Keep Right, OSMOSE, and Atlas Checks. These tools capture mapping errors based on topology rules and tagging schemas on OSM Wikipedia. Atlas Checks in particular focuses heavily on flagging road network mapping errors and fits our needs in evaluating OSM road network quality.

Rule-Based validation Using Atlas Checks

Atlas Checks is an open source quality assurance framework that flags various mapping errors. Leveraging the Atlas API, Atlas Checks can convert an OSM PBF file into a data format that efficiently represents OSM as a navigable network consisting of Nodes and Edges. As of today, there are over 40 Atlas Checks available that accurately detect attribution and geometry based errors. Each individual check looks to solve a specific issue with OSM data. See the following examples:

  • AddressStreetNameCheck is an attribute based check that targets Nodes that contain street names which do not match the nearby road network.
  • BuildingRoadIntersectionCheck is a geometry based check that flags OSM road features that overlap with buildings.

Atlas Checks can optionally output both GeoJSON and MapRoulette Challenges — empowering users to fix each problematic feature in their favorite editor.

Map Quality Measurement (MQM) Tool

While current rule-based validation tools generate feature based results to precisely flag mapping errors, fixing all these issues can be time consuming and inefficient. To preserve feature level detail and prioritize initial mapping efforts, there’s a need for a tool to identify priority mapping areas based on the density of the mapping errors.

In 2018, we developed the MQM (Map Quality Measurement) tool to fill this need. MQM uses Atlas Checks results to generate a vector grid layer that clusters mapping errors in a user defined area. The following steps describe its process. First, it creates a bounding box using a boundary file from a user defined area. It then uses the K-D tree algorithm to repeatedly split this bounding box in half while counting mapping errors within each grid. This continues until the majority of grids contain a relatively small amount of errors. Users can set the termination threshold of the splitting process to be when a certain percentage of grids contain less than a certain amount of errors. For instance, in our project the grid splitting ends when 90% of the grids contain less than 10 mapping errors, leaving the remaining 10% of the grids as error hot-spots. Finally, the hot-spot grids are rendered based on the counts of mapping errors within them, resulting in what we call the ‘MQM layer’. Figure 1, below, illustrates the grid generation process for the city of Minneapolis in the United States, and Figure 2 shows the MQM layer created using the final grids.

The MQM Web App

The MQM tool helps better assess map data quality, especially for large datasets. Utilizing the tool, we conducted a case study to measure the OSM road data quality in 51 major U.S. cities (the most populated city in each state), and created a web app to host the results. In this web app, users can view mapping error hot-spots in these cities, weight the results by usage level, and check out the cities’ rankings based on their overall OSM road data quality. More details can be found in our State of the Map US 2019 conference talk.

Visualize Mapping Error Hot-Spots Using MQM

To retrieve OSM road data for these 51 cities, we pulled the full OSM historic PBF files and filtered the data using city boundary files. We ran the data through 11 Atlas Checks that are focused on road network issues, and used the output GeoJSONs to generate the MQM layer for each city. Finally, we rendered the MQM layers on the web app, overlaying the actual city boundaries for better visualization. The MQM tool also generates fundamental statistics such as total OSM road feature counts, MQM grid size, mapping errors by type (geometry vs. attribute), among others, which we also included in the ‘Quick Stats’ panel on the web app to better assist in the planning of mapping activities. Figure 3, below, shows the user interface of the MQM app. On the left is the MQM layer for the city of Minneapolis, and on the right is the Quick Stats panel.

Weight MQM Results by Usage

Besides viewing the MQM results directly on the app, users can also explore weighting them by different usage factors. We chose population and car ownership as two factors to represent usage, and derived census-level data from the U.S. Census Bureau’s 2017 ACS 5-year reports. In order to generate results weighted by usage, we first converted the data into raster layers and overlaid them onto the MQM grids layer. We then used zonal statistics to calculate the mean value of the raster cells within each grid, and normalized the values to generate ‘census layers’. Finally, we combined these ‘census layers’ with the MQM layer and assigned the highest weight (0.7 out of 1.0) to the MQM layer. Figure 4, below, illustrates the complete workflow. This allows users the ability to select and view the MQM results weighted by population or car ownership within each city. Figure 5 provides an example of the MQM results weighted by population in the city of Detroit.

Rank Cities Based on OSM Data Quality

Using MQM, we generated road mapping error distributions for 51 major U.S. cities, and ranked them based on their overall OSM road data quality. The overall data quality in each city is measured by the percentage of road features that have mapping errors. The lower the percentage rate, the higher the ranking. With OSM data constantly being updated, we plan to conduct this study annually to test the MQM tool and monitor digitization and editing efforts in the OSM community year to year.

MQM in 2020

We think MQM is a useful tool for campaign coordinators or project managers to strategically plan out mapping activities on OSM. With that in mind, we’re continuing to enhance and refine MQM so it’s more applicable to a broader audience. One of our goals is to allow more types of input data so that other rule-based validation results can be processed by MQM. We’re also actively seeking opportunities to integrate MQM into other mapping tools, such as the HOT Tasking Manager, to assist mapping efforts across the OSM community, and further improving our data quality evaluation process and metrics. Please reach out to us for any comments and questions and stay tuned for MQM in 2020!

References

Ather, A.(2009). A Quality Analysis of OpenStreetMap Data. MEng Thesis, London, University College London.

--

--