The Geolytix Retailpoint Dataset

David Horgan

Part 2: Greater London

In the first part of this series, which can be found here, I looked at the store and population distribution at the UK national level.

In part 2, I have concentrated the analysis on London UK, principally because I live and work there. I have also enhanced the spatial distribution analysis by moving from a cumulative analysis to a density analysis using either stores per square kilometre or population per square kilometre.

In part 3, I will be analysing the Kingston and Wimbledon area of London, in particular aiming to account for the success of the Tesco Extra store at New Maden.

The theoretical expectations for these distributions according to the Huff Law and law of retail gravity is that they should have power law forms.

I found QGIS software fantastic for analysing and displaying GIS data and inspiring further exploration.

Greater London rendered with QGIS

My opening the dataset in Tableau I can obtain from lovely clear maps of both the distribution of stores and the population distribution around London.

Greater London stores via Tableau
Greater London population via Tableau

Using matlibplot, geopandas and basemap have the advantage that I can now bring to bear the tools offered by python including haversine, numpy and scipy and scikit-learn to analyse the distributions.

Greater London Stores via Matlibplot Basemap
Greater London population via Matlibplot Basemap
Greater London stores via GeoPanda

Using shapefiles for Greater London postcodes I’m able to use sjoin to select only those stores in the Greater London area. Then I can use the Haversine function to measure one-kilometre increments from central London and subsequently measure the density of the stores at increasing distances.

Store density against distance for London

Using geopandas and shapefiles I can plot a choropleth map of the population of Greater London. The haversine technique again enables me to measure how the population distribution depends on the distance.

London population via GeoPandas
Spatial distribution of population in London

By plotting the choropleth map of population and the retail locations for Greater London in the same figure we get an idea of how the two distributions are related.

London Population with stores via GeoPandas

Using statsmodels and seaborn I can calculate and plot the correlation between the retail and population distributions. There is a quite good correlation between these two distributions with R-squared: 0.935

The regression line for stores against the population
Correlation between retail and population distributions in Greater London

A power law trend line fits well to both the population spatial distribution and the retail store spatial distribution, confirming the utility of the Huff model and the retail gravity law for the actual measured store and population distributions.

Power Law fitted to the spatial distribution of London shops with n=1.3
Power Law fitted to the spatial distribution of London population with n=1.18

David Horgan

Written by

I am a theoretical physicist with a data science background. At present, I am developing a UK retail market using ABM, ML and computational econometrics.

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade