Expose the value — protect the data
After the 2011 census, and as ever, users wanted access to the data as soon as possible. However, the processes to ensure the essential requirement of protecting respondents’ confidentiality in the outputs were time-consuming. It didn’t stop there as even more data were published at intervals over the next three years. Come the next census in 2021, the Office for National Statistics is looking for innovative ways to make these data available to users much more quickly. With our expertise in handling big datasets and complex algorithms, we at Sensible Code are helping the ONS reimagine how to disseminate data from the 2021 census.
“The objective is to provide the census 2021 data quickly, flexibly and more accessibly”
We’ve been doing some discovery work for the 2021 census team. The process of discovery is to understand the bounds of a problem and solve it effectively. The first step was to validate whether the innovation being sought was possible and that it would meet the fundamental user needs.
Power to the user
Government departments, NGOs, community groups and businesses use Census data to help shape strategic decisions on policy and investment.
The value for them will be the ability to analyse multivariate data for themselves. They’ll be able to make complex queries and generate tables instantaneously. They’ll also download the tables into popular spreadsheet tools.
In 2011 tables were made by analysts inside the ONS in response to requests from customers. It takes time for data to be made in a suitable form for users. The main reason for this is to ensure that the privacy of census respondents is protected. This process is called statistical disclosure control and for 2011 this was a manual task for each user request.
Innovating for 2021
For 2021 the intention is to standardise and automate disclosure control. This will make it safe and allow users to create tables on demand. The ONS has been further developing a disclosure control technique called ‘the cell key method’ which was pioneered by the Australian Bureau of Statistics. The ONS has taken this method and is adding a further layer of protection. The reason is that it wants to permit users to view data at a low level of granularity and thus make it more valuable. The major challenge for us was to implement this successfully as an algorithm which could be validated by the ONS.
User needs are paramount
Once we were able to prove the disclosure control algorithm worked —we needed some form of UI to allow users to test the system. The UI shown here is what we call a straw man. Its value is being able to allow users to specify exactly what subset of the population should be included and which variables they would like to see in their table. A key deliverable for the proof of concept is to allow a limited number of ONS users test the system.
During the discovery phase — we’ve gone from nothing to an experimental data query builder which can be put into the hands of the users. It’s working and it’s fast. Users can produce tables in under 5 seconds from the 60m row database. But this is just the beginning of the journey! The next step in the discovery is to spend time analysing how users respond and interact with the proof of concept.