Can I Trust OpenStreetMap Data?

Validating OSM using regional survey data

Val Ismaili
Arup’s City Modelling Lab
6 min readJun 19, 2023

--

In Arup's City Modelling Lab, we're big proponents of open data. We build agent-based transport models of cities, regions and even countries. One of the key inputs to our models is points-of-interest data, which allows us to figure out where people might travel for their daily activities.

We use OpenStreetMap to help answer this where question. In a nutshell, it provides the location of facilities such as houses, shops, schools, hospitals, and social amenities (gyms, pubs, libraries, etc.).

Photo by GeoJango Maps on Unsplash

What's OpenStreetMap?

OpenStreetMap (OSM) is a free and editable map of the whole world to which anyone can contribute. Excluding Google Maps, many of the major online map platforms rely on OSM. If you've used Bing Maps or Apple Maps, you've used OSM. The scale of use and activity is massive — contributors make 2+ million daily changes.

“OpenStreetMap is built by a community of mappers that contribute and maintain data about roads, trails, cafés, railway stations, and much more, all over the world.”

How accurate is the data?

Because geospatial data is so vital to our transport models, our clients sometimes ask about the completeness and accuracy of OSM. As it's a community project, anyone can contribute and maintain data about pretty much anything you can put on a map.

Individuals decide what to add and with what level of detail. As a result, OSM data can contain inaccuracies, inconsistencies, or missing data.

You may or may not be able to find your house on OSM. And if you can find your house, it's unlikely to include detailed classification information such as whether it is terraced, semi-detached or detached (see here how we enriched OSM with other data sources to fill in this missing information).

Transport East Rural Mobility Survey Analysis

Over the last year, we've been working with Transport East (TE) to build BERTIE, an ABM covering their region, which includes Norfolk, Suffolk, Essex, Thurrock and Southend-on-Sea.

At the end of 2022, TE reached out to over 1,200 parishes across their region to conduct a survey to understand access to facilities, transport services and infrastructure.

Survey respondents were asked whether their parish contained facilities of the following categories:

  • Education
  • Medical
  • Sports facilities
  • Library
  • Community Centre
  • Bank/Post Office
  • Tourism attractions (museums, nature sites, farm shops, etc.)
  • Everyday "essentials" food shop, i.e. convenience shop
  • Supermarkets
  • Large goods shopping, i.e. small appliances, furniture, retail park
  • Leisure shopping, i.e. clothing/shoes/non-food
  • Fuel station
  • Socialising, i.e. pub

TE published the results of this survey publicly. We did some work for them to understand how the results matched what exists in OSM (and, thus, what sits behind our models).

How does OSM stack up against the survey?

We extracted ~30,000 facilities in the TE region falling into the above categories. We aggregated these facilities into the following broader facilities categories:

  • Shopping
  • Social Amenities
  • Bank & Fuel
  • Education
  • Medical

The table below shows the percentage of parishes that contain a facility of each category in the OSM data, compared to the survey results. We've also plotted the OSM facilities onto a map of the TE region, divided by survey parishes.

Comparison between OpenStreetMap and Transport East Rural Mobility for % of parishes containing a facility of the five main categories.
~30,000 facilities extracted from OpenStreetMap within the Transport East region that fall into categories of Shopping, Social Amenities, Bank & Fuel, Education or Medical.

The table highlights some interesting points:

  • OSM data closely follows survey results for each facility category, generally within a 5–10% margin of the corresponding survey results.
  • Shopping facilities in OSM exceed the survey results. This could be a perception issue; while you expect people to know whether their parish contains an everyday food shop, they may be less aware of appliance shops.
  • Access to shopping facilities is low, suggesting that most people must travel out of their parish for basic essential shopping.

A more detailed look

If we look at the individual categories queried in the survey, there are some standout insights:

Comparison between OpenStreetMap and Transport East Rural Mobility for % of parishes containing a facility across a more detailed breakdown of categories.
  • OSM access to most facility types closely follows the survey results and typically only falls behind by ~2–8%.
  • We see that for unambiguous facility categories with distinct purposes (i.e. medical, education, banks and fuel stations), there are remarkably close matches.
  • Social and tourism facilities have the more reliable coverage in OSM with only 1% and 5% differences to the survey results.
  • There's a large discrepancy for libraries. This may be because the survey considered mobile libraries, which would not be included in OSM. Or perhaps a library has closed since last used by the respondent.
  • Sports facilities are a particular area of missing detail in OSM for the region.
  • Though supermarket data closely aligns, everyday "essentials" convenience shops are not as well covered in OSM.
  • Facilities that are less likely to be used for daily activities, such as leisure (e.g. jewellers) and large goods shops, present with higher detail in OSM than the perceived availability through the survey. This could be a discrepancy in definition or survey respondents being less aware of these less essential facilities.
Extracted facilities from Norwich City Centre.

Interpretation

OSM data closely matches survey responses across the different categories, although tending to lag by a few percentage points.

A couple of things to keep in mind with this analysis:

  1. Survey responses are not a direct representation of reality. Parishes are amongst the smallest geographical divisions in the UK, so residents should have a good idea of the facilities contained. However, as the above results suggest, respondents may be familiar with facilities used for daily activities but less familiar with other types of facilities, such as clothing or small appliance shops.
  2. Interpretation of OSM tags. We've mapped OSM tags to match the categories included in the survey results. This is not an exact science, so we should expect some differences. Unambiguous facility categories with distinct purposes (i.e. medical, education, banks and fuel stations) presented with very close matches.

So what do we do about the missing data?

We built OSMOX to extract facilities from OSM data and understand how a population uses them. OSMOX has some valuable features to help with missing or incomplete data in OSM:

  1. Spatial inference for missing activities: when no specific OSM tags exist for an object, OSMOX can infer tags based on spatial operations using surrounding facilities.
  2. Filling in missing building objects: sometimes OSM does not provide individual facility points but does provide a land-use boundary, e.g. 'industrial'. OSMOX provides an ad-hoc solution to infill the area with a grid of facility points.
  3. Multiple activity labels: OSMOX retains information about multi-use facilities.
Norwich facilities extracted using OSMOX.

Sometimes OSM simply doesn't contain the information we need, so we use public datasets for specific facility categories to enrich the OSM data. A couple of examples:

  1. Retail Points: an open dataset tracking a comprehensive set of supermarket and convenience store locations across the UK.
  2. Education establishments, UK GOV: a list of all education establishments in the UK, including features such as the number of pupils.

Summary

As one of the most significant geospatial open-source projects, OSM is a uniquely valuable resource for location data. The Transport East Rural Mobility Survey allowed us to directly compare OSM data against the local knowledge of residents in the area.

Overall, the results are very positive. OSM data follows survey results closely across most facility categories. These results are especially promising given that the Transport East region is a semi-rural region of the UK, and rural areas in OSM typically have worse coverage than urban areas.

OSM and survey results match particularly closely for unambiguous facility categories with distinct purposes (i.e. medical, education, banks and fuel stations). However, OSM lags in some categories, such as small convenience food shops and libraries. These discrepancies highlight the value of our open-source project OSMOX for enriching OSM data with other public datasets.

If you have any questions, please get in touch — citymodelling@arup.com

If you would like to contribute to OpenStreetMap — have a look here

See also: OSMOX: Extracting Facility Locations from OpenStreetMap

--

--