Inferring housing types from Open Street Map data

How we filled in the blanks to understand access to EV charging at home.

Kasia Kozlowska
Arup’s City Modelling Lab
6 min readMay 11, 2023

--

We are currently working on modelling the future charging demand of electric vehicles (EVs). We think people are most likely to charge their car at home — if they can — and we believe the ability to charge at home will depend on the type of accommodation they live in. People living in detached and semi-detached houses are more likely to have access to charging facilities at home than those living in terraced houses or apartments, for example.

We use Open Street Map (OSM) data to build our representations of transport networks and activity locations, but most buildings in OSM do not have sufficient data to classify them into accommodation types. In this blog, we attempt to use the wider data in OSM and the Irish census to infer the information we need and fill in these gaps.

Photo by Zaptec on Unsplash

OSM is the source of a wealth of information. The OSM dataset covers the whole globe (to varying degrees of accuracy) and you can find pretty detailed information about administrative areas, transport networks, and buildings. Buildings are what we will be focusing on in this blog.

The agents in our models have been rigorously assigned a home location during their synthesis using City Modelling Lab’s osmox tool. We are planning to check each agent’s home against OSM building polygons to assign them an accommodation_type attribute (such as detached, semi-detached, terraced, flat), which will impact their EV charging behaviour in our simulations.

Photo by Photoholgic on Unsplash

Open Street Map tags

Screenshot of https://overpass-turbo.eu/ query for `building=*` (any building object in this area)

OSM objects hold a lot of information in the form of tags. These tags, like the OSM objects themselves, are created by OSM’s community of volunteers, without whom this amazing dataset would not exist. Open collaboration at its finest! ❤

You can find a spatial polygon in OSM for almost anything — up to and including garden sheds! Often, however, the tags assigned to these polygons are not very descriptive and without a human looking at the map, it’s hard to say what the building is. Sometimes the tags can be downright cryptic, like when a building is tagged with building=yes .

Screenshot of https://overpass-turbo.eu/ query for `building=yes`

Some houses are tagged with house or residential, which is better than yes, but still lacking the information we need to understand the probable availability of charging facilities.

Screenshot of https://overpass-turbo.eu/ query for `building=house`

In an ideal world, buildings would be tagged with one of the following OSM tags: apartments, bungalow, detached, dormitory, semi_detached or terrace. Only a limited number of buildings are tagged in this descriptive way however, so we’ve done some work to infer what that tag may be in the cases where it is missing.

Filling in the gaps

We can theoretically deduce the type of housing somebody lives in if we know the number of neighbouring properties their house has. A house with zero neighbouring properties is detached; a house with one neighbouring property is semi-detached, a house with two is terraced, and so on. We decided to use this kind of deduction to our advantage.

The GeoPandas geopandas.GeoSeries.touches function can tell us which geometries touch each other, but we had to use it carefully. We could, for example, compare each untagged house with all of the other houses in Ireland. Although this would give us the correct answers, it would involve spending a lot of time making pointless comparisons between houses in, say, Dublin and houses in Cork or Galway that could never be neighbours. We love Python, but it is not the zippiest programming language on the block. To keep processing times down, we knew we needed a smarter approach.

Building polygons with insufficient OSM tags data to classify their housing type, coloured by the number of neighbours they touch

To support a divide-and-conquer approach, we split the dataset into spatial batches using S2 Geometry cells, ending up with approximately 100 batches covering the entirety of Ireland. Each batch was compared to a dataset subsetted with a slightly buffered copy of the same cell — this was to include any possible neighbours on the boundary of the cell.

We ran our data augmentation job in parallel on multiple small/medium virtual machines in AWS for around a day. The running time is nothing to boast about, but it’s something we only have to do once, so we decided not to spend too much time squeezing out further marginal gains.

We ended up with an output that is definitely aesthetically pleasing when pictured on a map.

Building types inferred using their OSM tags or the number of neighbours

However, we still had a couple of smaller problems to tackle before we could move on.

A fun little problem was that the end-of-terrace houses had only a single neighbour — just like semi-detached houses (you can see the before and after in the picture below). Thus, we cannot rely on a simple neighbour count to correctly label an end-of-terrace. We must also count the number of neighbours of their next-door neighbour. If the next-door neighbour has two neighbours, we can assume it is a terraced house, which means this house is an end-of-terrace. This was important, because we think terraced houses will be less likely to have off-street parking.

End of terrace houses have the same number of neighbours as semi-detached. We looked at the number of neighbours their neighbour has to label them as a terraced house.

Discovering blocks of flats presented another problem. It is impossible to infer simply from the geometry of the building’s polygon. In this case, we utilised Irish census data.

Proportion of flats around Dublin, based on census data

We tried to match the proportion of flats in each of the administrative areas for buildings to the proportion in the census data, starting with polygons that had no neighbours and giving priority to polygons that had the largest area. We classified the rest of the buildings with no neighbours as detached houses.

House types adjusted to flat proportions in the census

Summary

Using OSM tags and Irish census data, we managed to label OSM building polygons with house and accommodation types sufficient for our use case. Agents living in flats or terraced houses will now be less likely to charge their electric vehicles at home, compared to agents living in detached or semi-detached houses. In the future, given access and opportunity, we could use accurate data from a land registry or other survey to infer this information.

Read more about our adventures in the space of modelling electric vehicle charging in this blog post!

--

--

Kasia Kozlowska
Arup’s City Modelling Lab

Software Engineer in the City Modelling Lab in Arup, London.