Is Doomsday the best model for a modern census?

Clifford McDowell
Doorda
Published in
5 min readMay 23, 2018
Photo by Jack Cairney on Unsplash

Every 10 years we receive a questionnaire asking us to detail what is happening in our household. This hasn’t changed very much from 1066 so we wondered with the wealth of public data being made available if we could offer a fresher version without the need for 30 million questionnaires.The last one was 6 years ago when Obama was president, we had a coalition government and the UK was still in the EU. A lot has happened but our knowledge of local communities and the makeup of neighbourhoods has remained the same.

So we posed a few questions to see if at the very least we could offer a refreshed census of England.

Is it possible?

The most obvious starting point. The standard census relies on each household completing a questionnaire and it being returned to the census bureau for processing (Under the ONS). Once processed the data is, in the main, split between household related data (e.g. number of bedrooms) and person (e.g. age). This is grouped into comparable areas so the government can see if one area is doing better than another and allocate budgets accordingly.

So is there enough data available at the required level to update the current statistics?

With over 960 public data publishers we were able to find plenty of data related to property in England and Wales (Scotland and Northern Ireland fell outside the scope of our project). These data sources provided us with property characteristics, prices and property type, from a household perspective the answer was yes.

In terms of people, again we were able to uncover fresher population estimates and socio-economic data relating to employment and benefit claimants. Some of our key findings in terms of changes from the 2011 census were

  • 1 million more residents now live in England
  • 200,000 postcode changes
  • Over 700,000 new homes have been completed

In answer to our earlier question is possible? At the very least we are able to augment the census with additional data to provide a more accurate understanding of changes from 2011. As part of this investigation, we also uncovered a lot of new data sources we wanted to consider which led to our next question.

Can we make it more relevant?

Whilst the government may want to compare areas for the allocation of budgets, education and healthcare services the commercial space is more interested in postcodes. Whilst the ONS could release data at this level it would remove the anonymity we all expect from the census. However, we can still model data to a postcode level whilst maintaining anonymity as other organisations already do to create geo-demographic profiles (Mosaic by Experian and Acorn by CACI being two great examples). So we knew it was possible at some level.

With a lot of trial and error, we were eventually successful in merging postcodes with the census output areas. As part of this we were also able to incorporate more recent population estimates by applying ratios to the original data. We are heavily over simplifying this stage as some postcodes had been terminated, some replaced, and others have had all the properties knocked down and rebuilt. We also needed to factor in multiple occupancy postcodes such as student halls and retirement homes which had the potential to skew an area quite badly.

With a lot of tweaking to our algorithms and investigation, we were eventually able to incorporate current postcodes with far more accurate population estimates. Good progress, but we also wanted to see what else we could add.

Photo by Hammer & Tusk on Unsplash

What else can we add?

Once we had the basics in place our attention turned to what other data sources we could add to make the updated census even more meaningful. We know crime rates have a major impact on communities along with employment levels and property prices. Bearing in the mind the feedback from our end users we also thought we needed a couple of layers above postcodes, taking all this into account we merged the following.

Emergency services — We added postcodes to the reported crime and uploaded everything back to 2011. This allows users to plot reported crime against the refreshed census data. We also added London Fire Incidents. Ideally we’d like to add fire incidents nationally but for now, only London is available so better than nothing.

Benefit claimants — With so much of the census requesting answers to questions on benefit and social housing we thought we’d layer actual data on top. These include housing benefit, Job Seekers Allowance and Disability claimants to name a few.

Property — What’s being built, how much does it cost are all of critical interest when trying to understand an area. To ensure we stayed on top of this we included the average property sale prices, property characteristics and the number of new builds by postcode. We don’t’ have content for every postcode but we do for most which adds an intriguing layer of additional detail.

Locations — Postcodes are great but what if we wanted to extract data based on a town or village. With this in mind, we thought we’d incorporate two additional layers next to each postcode. This allows users to extract data based on an area (e.g. Leeds) rather than having to select each individual postcode (roughly 26,000 in Leeds).

How often can we refresh it?

Our postcode and population datasets now allow us to potentially refresh the data on a monthly basis. On top of this the socio-economic data relating to pensions and benefit claimants are frequently refreshed and most of the property data we found is on a monthly refresh cycle.

Potentially we could refresh our revised census monthly. However, this appeared to scare a few of the analysts we spoke to as it would also require them the update their models on a monthly basis! This feedback combined with how people wanted to consume it led us to a believe a quarterly refresh would be preferable for most.

The future?

As part of this project, we came across a lot of datasets relating to finance so we’re looking into the addition of a financial module covering mortgages, loans, and insolvencies to name a few. We’d also like to expand the range to include the other home nations. Wales should be straightforward, however, Scotland and Northern Ireland are very different kettles of fish! Not impossible but will require more thinking.

In answer to our original question we believe it is possible to heavily augment the census, making it more relevant to 2017 as well as the years to come before the next one in 2021.

--

--