Improved catchment areas using human mobility data.

Carlos Asuero Salcedo
Geoblink Tech blog
Published in
5 min readApr 30, 2020

It is known by everyone how important catchment areas (i.e. the area from which a store attracts a population) are in the retail industry. We at Geoblink, as part of the development of our location management platform, are always researching on ways to improve them.

If we take a look at catchment area “history”, we can see that mainly two different approaches have been used. First, the simplest one, circular catchment areas, which define the attraction as a fixed given radius around a store.

Figure 1: Circular catchment area of 5 km of radius

Second, we have isochrones areas, which define the maximum reach of a store in terms of a transport mode (i.e. walking, cycling, driving…) and a time range. This second type of areas, rely on the use of cartographies mapping all the streets and roads available in a given geography, to provide a much more accurate definition of a store reached.

Figure 2: 5 km traveling by street network

Nevertheless, defining the reach of a store, is only the first step of the process. Once we have the area, it is time to describe the people/consumer profile associated to it. To do so, we rely on spatial intersections between catchment geometries and data coming from public institutions, private providers, etc. And it is at this point when circular catchment areas and isochrones show their weaknesses.

Circular areas and the isochrones ones give us information about a specific area around the location, but this doesn’t take into account the fact that people that were in our location at some point could live, work or spend time in other places outside the catchment area, so, customer profiling based only on point of sale surroundings can prove inaccurate.

Nowadays we have tools to know, at an aggregated and anonymous level, where the people are at different moments, and we can use this data to define a more realistic catchment area. Among other alternatives, we have decided to focus on GPS data, which provides us with signals (i.e. timestamp + latitude + longitude) of anonymized users during a time frame

We can use GPS data to estimate both the number of people spending time at specific locations (i.e. stops) and the number of people moving between locations (i.e. flows). But we only have specific coordinates at specific moments in time, so, first of all, we needed to define a methodology to detect when the people are staying, minimizing the error from the signals, and also to tag those stays in terms of purpose (e.g. Home, Work, Others…). We may give more details about our dwell detection and tagging methodology in a future blog post.

Once we are able to know where users have been staying, how much time and the purpose of the stay, it does not make sense to continue analyzing only the closest environment.We can actually filter the people who spent time in a target location during a specific time window and query all data to estimate their origin distribution. With this, will get points, but we need an area, so we have to define which is the best polygon that groups all users’ positions. In our approach we use a grid of hexagons.

Figure 3: GPS signals of March 2019
Figure 4: GPS signals of March 2019

With all this information, we can define actual catchment areas as the distribution of people staying at a specific location with a specific purpose (i.e. Home) given they have stayed at a different specific location with a different specific purpose (i.e. a store).

As it can be seen in the images above, actual catchment areas have no limitations regarding the closest environment like the other options, and open possibilities to continue doing more accurate studies using data from public and private institutions, allowing us for much better people/consumer profiling. In addition, these actual catchment areas can be calculated at different points in time and so, they are dynamic and can better capture people/consumers seasonal behaviours. And even better, they allow us to ignore zones where consumers are not present, and set weights on zones based on number of consumers.

Figure 5: Comparing circular catchment area with real CA

In terms of comparison between the three alternatives discussed, Figure 5 shows how the circular catchment area covers only half the city, and we can see how there are users that visit the location under study in March 2019 that are not covered by this circular catchment area. If we consider GPS data for a longer period, for example a whole year, the amount of data that we will lose will be much bigger. In addition, it can be inferred from the image that the circular area will take into account several populated areas which have nothing to do with the venue under study as no one from those areas is actually visiting it.

Below we show an image where we analyze the same location but this time comparing the real catchment area with an isochrone of 20 minutes walking. The situation is, more or less, the same: the area fails to capture the actual behaviour and location of consumers.

Figure 6: Comparing 20 min walking CA with real CA

The most difficult thing to do with this kind of analysis is to manage the large volume of data from GPS signals, and define solid criteria to filter errors and interpret where the people are while doing the activities that we need to study (home, work, leisure, etc.). The quality of the user location tagging, will define the quality of the future catchment areas gotten.

So, given the results shown above, we can conclude that in terms of catchment area we have to focus not only on geometry quality, but also on identifying the user profile of that geometry. And it seems pretty clear that having actual catchment areas that take into account actual people/consumers behaviour and location provides with much more accurate results than other “more traditional” approaches.

Carlos Asuero Salcedo (GIS Engineer)

--

--