OSMOX: Extracting Facility Locations from OpenStreetMap

Introducing our new open-source tool for querying and filtering OSM

Anastasia Kopytina
Arup’s City Modelling Lab
7 min readJan 10, 2022

--

This article introduces OSMOX, a tool built and recently open-sourced by the City Modelling Lab. It enables the extraction of facility locations from OpenStreetMap data and the labelling of these facilities with activity-based tags. It can extract other useful facility features, such as floor areas or distance to the nearest transit stop. The OSMOX package is available on GitHub here.

The focus of the City Modelling Lab (CML) is simulating processes, in this case, transport, using agent-based models (ABMs). We do this to try and help make our cities healthier, cleaner and more equitable. Broadly speaking, to build a transport ABM, two main inputs are required: a transport network (currently done using our own software tool, Genet) and a population of agents (built using PAM, another tool the CML team has built).

Each agent has a set of particular attributes as well as a 24-hour activity schedule associated with them, which specifies where and when the agent will be travelling during the simulation. (If you would like to read more about Activity Plans, have a look at this article on PAM.) The “where?” question in the activity schedules is the reason why OSMOX was developed.

Activity locations

The activity plans are usually built from travel diaries; these diaries contain zone-level information on the areas where each activity takes place. For privacy reasons, this survey information aggregates locations to zone level, ensuring that real individuals cannot be identified or tracked.

However, zones can be quite big, and since the goal of ABMs is to realistically model transport choices and behaviour, the model requires precise departure and destination locations (i.e. exact coordinates) for agent travel plans. We therefore need a way to assign reasonable locations within a given zone for these activities.

We love open data here at the City Modelling Lab, and we especially love OpenStreetMap (OSM). It offers free geospatial data with an incredible level of detail for (nearly) every corner of the world. We already use OSM for many other applications at the Lab, so it was an obvious starting point to explore how we could better specify activity locations in the agent plans.

We found that the building tag in OSM was extremely useful for this, as it contains highly specific information on building use purposes; we then thought it could be possible to map between building types in OSM and the activity types in agent plans.

So we built OSMOX to extract useful features from the OSM building data, and use this information to realistically assign precise locations for the activities in agent plans. OSMOX does this by creating a record, in GeoJSON format, of the locations of all the facilities which are suitable for each activity type. We can then use the Facility Sampler class from PAM to assign a facility from that GeoJSON file for each activity in agent schedules.

Visualisation of agent_52 activity schedule (above) and zonal distribution of these activities (below)

The diagram above shows a visualization of an example activity plan created in PAM. The schedule of agent_52 has 4 types of activities in the 24-hour period: Home, Work, Education and Medical. Initially, the activities in this schedule will only have locations specified at a zone level, as shown above.

Using the record of the locations of the relevant facilities and their activity tags created by OSMOX, we can pick a precise home location from a residential area in Zone A, an educational facility in Zone F and so on. The schedule will therefore have precise location coordinates associated with every activity.

Step-by-step process

The process of generating this GeoJSON record of facility locations using OSMOX is as follows. We begin by downloading the OSM data for the relevant area. Next, we use the osmox run command and pass it the OSM data file location, directory for output, and the projection we are working in and the config file, as follows:

The config file has 2 basic parts:

1. filter: a list of OSMObjects, based on OSM tags, to be labelled as facilities; the rest of the OSMObjects will be filtered out and removed.

2. activity_mapping: specifies the conversion of the OSM tags (specified in the filter`) into activities; for example, an object with the tag building:apartments will usually be labelled with the activity tag home.

Example configs are provided in the repo here, and it is worth using them as a template and making any necessary edits to them rather than making new ones from scratch, since the configs are fairly detailed. Once the command has been executed, you will find OSMOX output file in GeoJSON format in the specified output directory.

Terminal outputs after the ‘osmox run’ command has been executed

Example: Greater Manchester Area

If we imagine that agent_52 from the earlier example lives in the Greater Manchester Area, then we can generate the facilities record for this area, and explore the options for locations of the activities in the agent’s plan. After downloading the OSM data for the Greater Manchester Area from Geofabrik, and executing the osmox run command, we can have a look at the selection of facilities for our activities of interest below (we used kepler.gl to visualise the facilities from the GeoJSON file).

Home facilities in the Greater Manchester Area
Close-up of home facilities in the Greater Manchester Area, with information about example facility
Work facilities in the Greater Manchester Area
Close up on some of the medical facilities that are part of the University of Manchester Hospital, with information about example facility
A selection of educational facilities that are part of the University of Manchester campus

The GeoJSON file produced by OSMOX can then be used to pick a random facility with the required tag in the specified area. However, the choice of a facility in a specified area is often better than random, since there are additional parameters that can be specified in the config, which will produce more features to enable better facility assignment for the activities. These parameters are as follows:

  • distance_to_nearest: you can specify in the config that for each facility, you would like OSMOX to also extract the distance to the nearest facility with another activity tag (e.g. ‘transit stop’ or ‘education’)
  • object_features: we are constantly making our ABMs more realistic — in the same way as buses and trains in the simulation have specified capacity, we want the larger buildings, such as office buildings, to accommodate multiple agents in the simulation. To facilitate that, OSMOX can extract information on building units, levels, area, or floor_area from the OSM data.
Visualisation of the distance_to_nearest_medical values for facilities in the Greater Manchester Area

Some more useful features

Furthermore, OSMOX has a number of features which allow it to handle more complexity, such as missing data and multiple activity tags for the same OSMObject tag:

  1. Spatial inference for filling in missing activities: when no useful tags are found for an object, OSMOX can infer its tags based on spatial operations with surrounding tags.
    I. contains — assigning tags based on the tags of objects that it contains.
    II. within — where an OSMObject still does not have a useful OSM tag, the object tag will be assigned based on the tag of the object that it is located within. The most common case is for untagged buildings to be assigned based on land use objects.
  2. Filling in missing building objects: sometimes small areas will not even have building objects. However, these areas will often have an appropriate land use tag, e.g. residential. A very ad-hoc solution for such areas is to infill them with a grid of objects. (This fill-in method only covers areas that do not have the required activities already within them.)
  3. Multiple activity labels: OSMOX config allows you to specify what to do with multiple-use facilities: it gives you an option to split multi-activity facilities into multiple single-activity facilities (extracting two identical buildings), or allow combined tagging (such as creating a new tag shop,work).

Conclusion

In open-sourcing OSMOX, it is our hope to facilitate work on creating more realistic agent-based models, beyond the City Modelling Lab. We also think that OSMOX can be useful in many other research activities and beyond, such as 15-minute neighbourhoods, equity studies and placemaking endeavours.

If your work involves using OSM for geospatial analysis and you have found this post interesting, then please check out OSMOX on GitHub. We’re still busy adding functionality and documentation, and we would be more than happy to see people get involved and contribute their code or experience to the project.

If you would like to make use of OSMOX or have more questions then please get in touch — citymodelling@arup.com

Thanks to Fred Shone for his work on building OSMOX.

--

--