Command-Line Cartography for a UK Election - Python REPL Edition

How GeoPandas can be used instead of Javascript tools.

Mike Bostock’s Command-Line Cartography tutorial shows how to use a series of command line tools to make a thematic map of population density. In a previous post I followed those steps to make a map of UK Election results. But I couldn’t help thinking the data munging part might be easier in Python using the GeoPandas library. In this post I re-create my map using Pandas and GeoPandas at the Python REPL.

Fetch the data

To make our map we need constiuency boundary geometries and election results data. I downloaded these as follows:

# GB constituency geometry
wget http://parlvid.mysociety.org/os/bdline_gb-2019-10.zip
unzip -o bdline_gb-2019-10.zip Data/GB/westminster_const_region.prj Data/GB/westminster_const_region.shp Data/GB/westminster_const_region.dbf Data/GB/westminster_const_region.shxmv Data/GB/* .
# Northern Ireland constituency geometry
wget http://osni-spatial-ni.opendata.arcgis.com/datasets/563dc2ec3d9943428e3fe68966d40deb_3.zipunzip 563dc2ec3d9943428e3fe68966d40deb_3.zip
# Election Results 1918-2017
wget http://researchbriefings.files.parliament.uk/documents/CBP-8647/1918-2017election_results.csv -o 1918_2017election_results.csv

Dive into Python

To stay closer to the ‘command line’ style I’m going to do this with the Python REPL but of course a Jupyter Notebook/Lab session would be great too.

python

We are going to lean on the GeoPandas library as well as Pandas for some general data manipulation. Once installed you can import them as normal:

import pandas as pd
import geopandas as gpd

First let’s read in the constituency polygon boundaries for Great Britain:

gb_shape = gpd.read_file('westminster_const_region.shp')

If you want to quickly check the geometry visually we can import matplotlib and then simply call .plot() on the GeoDataFrame:

import matplotlib.pyplot as plt
gb_shape.plot(); plt.show()

I noticed in later plots the Shetlands were missing from the map, I think there’s an error in the CODE field in this data: S1400005 should beS14000051. So let’s quickly change that now.

gb_shape['CODE']=gb_shape.CODE.apply(lambda s: 'S14000051' if s == 'S1400005' else s)

I guess that would’ve been an awk command on the Linux command line. Now load in the constituency boundaries for Northern Ireland:

ni_shape = gpd.read_file(‘OSNI_Open_Data_Largescale_Boundaries__Parliamentary_Constituencies_2008.shp’)

We want to concatenate the two sets of boundaries together into one DataFrame. To do this, I first need to adjust the column names so that they match:

ni_shape.rename(columns={'PC_NAME':'NAME','PC_ID':'CODE'},inplace=True)

Secondly, as we found in the previous tutorial, the GB and NI boundary geometry are provided in different co-ordinate systems. The Great Britain (GB) boundary file data is already projected onto the British National Grid (EPSG:27700). However the geometry data in the Northern Ireland (NI) boundaries are defined in EPSG:4326. Instead of reaching for ogr2ogr we can use GeoPandas to do this conversion:

ni_shape.to_crs({‘init’:’epsg:27700'},inplace=True)

And now we should be ready to concatenate the two sets of geometry using pd.concat, which should deliver another GeoDataFrame.

uk_boundaries = pd.concat([gb_shape[['NAME','CODE','geometry']],ni_shape[['NAME','CODE','geometry']]], sort=False)

Merge the election results with the geometry

Now read in our election results:

results=pd.read_csv(‘1918_2017election_results.csv’)

I want to filter this DataFrame to only include results from the 2017 election. [I also renamed some columns first to correct trailing whitespace and with an eye on merging with the geometry data later].

# rename a few columns
results.rename(columns={
‘turnout ‘:’turnout’,
’constituency_id’:’CODE’,
},inplace=True)
# keep only the 2017 election results
results=results[results.election=='2017']
# keep only the columns we need
results=results[[‘CODE’,’constituency’,’country/region’,’con_share’,’lib_share’,’lab_share’,’natSW_share’,’oth_share’,’turnout’]]
uk_results = uk_boundaries.merge(results, on='CODE')

The uk_results DataFrame contains the geometry and election results for each constituency.

In the Command Line Cartography tutorial we simplified the geometry to make the resulting file smaller. It is possible to use GeoPandas to do this simplification (through Shapely):

uk_results[‘geometry’] = uk_results.geometry.simplify(tolerance=500)

Preparing for Presentation

Although it is of course possible to produce an svg using a Python stack, my first goal is to create a GeoJSON output from this merged data. This would allow me to drop back into the javascript world of Command Line Cartography, or ObservableHQ notebooks to produce work on the final presentation.

For convenience — I still find data munging in Javascript less intuitive — I’m going to incorporate fill colours for the geometry into the DataFrame now using Pandas. On one hand I’m slightly uncomfortable that this mixes the presentation and underlying data. On the other hand it doesn’t prevent me, or someone else, ignoring this embedded fill colour and deriving new presentation from the underlying data which will remain embedded in the GeoJSON.

First I make a small dictionary defining the party colours, taken from here:

party_colours={
“con”:”#0087dc”,
“lib”:”#FDBB30",
“lab”:”#d50000",
"snp":"#FFF95D",
“pld”:”#3F8428"
}

Then we work out who the winner was in each constituency. As before, the winner simply has the highest share of the vote.

uk_results['winner']= uk_results[['con_share','lab_share','lib_share','natSW_share','oth_share']].fillna(0.).astype(float).idxmax(axis=1).apply(lambda s: s[:3])

It’s slightly dissatisfactory that the election results data I obtained does not separate out the main nationalist parties in Scotland and Wales and instead groups them into natSW_share. However, because the Scottish National Party does not run candidates in Wales and Plaid Cymru do field candidates in Scotland, we can use the constituency region to determine particular national party in each case and so allocate the appropriate colour.

def sub_nat(winner, region):
if winner=='nat':
if region=='Wales':
return 'pld'
else:
return 'snp'
else:
return winner
uk_results['winner']=uk_results[['country/region','winner']].apply(lambda r: sub_nat(r[1],r[0]), axis=1)uk_results['winner_fill']=uk_results.winner.apply(lambda s: party_colours.get(s,”#aaaaaa”))

This doesn’t resolve the colours for Northern Ireland parties. I’d have to look for a more detailed election result data set that provides that breakdown.

Now we can write out this DataFrame ad GeoJSON:

uk_results.to_file(‘gdp_uk_results.json’,driver=’GeoJSON’)

At this point we could switch back to Command-Line Cartography and produce an svg:

geoproject 'd3.geoIdentity().reflectY(true).fitSize([960, 960], d)' < gdp_uk_results.json \
| ndjson-split 'd.features' \
| ndjson-map -r d3 'd.properties.fill = d.properties.winner_fill, d' \
| geo2svg -n --stroke none -p 1 -w 960 -h 960 > gdp_uk_winner.svg

So there we have it. I’ve used Python and Pandas for data manipulation much more frequently than command-line tools such as awk, and performing data manipulations with Javascript. So inevitably I found it easier and more intuitive to manipulate the geometry and election results together through this tutorial that in Part 1. Perhaps that’s just my familiarity bias. It looks like there are more line of code in total here.

I still prefer Observable notebooks and the data driven document philosophy for presenting results ready for the web. But I’m definitely interested in exploring Altair’s capabilities especially to output Vega descriptions which are simple to embed into webpages. Perhaps in a future post.

Available for Data Analysis and Visualisation