[Spotlight < Privacy] Where the rubber meets the road in open data
by Paul Stone and Ania Calderon
The wonderful thing about sharing with people around the world is the discovery that despite different cultural settings, some of the challenges we all face are the same. What’s different is the perspectives and approaches to address those challenges, and this is where we can learn off each other and build on our ideas — like open source minds.
That is the purpose of the Open Data Charter’s Implementation Working Group — a trusted space to support public officials and experts working to deliver the Charter Principles by facilitating the sharing of practical knowledge, drawing on the experience of people actually working on making open data happen — where the rubber meets the road.
The working group has just embarked on a new way of operating to maximise the utility of its meetings, focusing on a spotlight topic of significant nuanced debate. The aim is to exchange knowledge and experience, ask questions and gather insights — then write something like this blog to share the learnings wider.
So taking this under consideration, the first meeting with this new approach was held in late July and focused on how to embed privacy by design within open by default policies. Some of the key questions that were raised, included:
Where does public good outway the right to privacy? Or, How can we confidentialise data more efficiently? And, where is confidentialised data not good enough?
Many government officials around the world are grappling with privacy issues while trying to open data. They are facing new regulations that seem contradictory and should be equally complied with. The new GDPR put in practice in Europe one year ago has impacted in how countries and cities around the world are looking to protect data.
With recent data scandals making headline news shifting public opinion and increasing privacy concerns for more protection and safety of personal data.
Another issue raised is confidentialisation: a key element when publishing useful granular data and keeping people’s identity safe at the same time. Many governments are struggling with techniques or resources to do this, and are testing methodologies, governance models and new software.
It’s important to be aware that two words, ‘confidentialisation’ and ‘anonymisation’ get used interchangeably, but depending on where you are from, they can mean something different. At some national statistical offices, including New Zealand, anonymised equates to “de-identified”, where variables that directly identify someone have been removed (such as name and date of birth). Confidentialised data is where additional variables that might help identify someone, known as “quasi-identifiers”, have been grouped or aggregated. This is done until there is a set minimum number of records with the same values for all quasi-identifiers has been reached.
De-identified data is suitable for research and linking data together, but still requires strict access control. In this blog we refer to confidentialisation, meaning the higher level of privacy protection more suitable for open release.
Fear and trust
Many misuses of open data have arisen too. This has raised new obstacles and increased fear about whether releasing data may lead to unintended consequences. The status quo is that if there are questions about what to open and what not to open, the default is not to open. The scale of justice tips in favour of safeguarding, while it’s younger sibling of openness will always require a fight.
How to overcome this fear and balance the scale?
Ukraine is prioritising key data that by law should be made open. The new president wants more transparency and to open data with some personal information while preserving citizen’s privacy; the question is how to do this right? For instance, there have been some cases where Judges have been robbed since their home addresses are public in their assets declarations.
These kinds of experiences are not unique to Ukraine, Sweden used to open everybody’s tax returns in an open format but now you have to identify yourself if you want somebody else’s tax return. See case here.
In Argentina, there has been debate whether they should continue publishing data about car accidents from the national agency for road safety. This data would include locality and demographic information about those involved in the accident, such as gender, age, and place. Recently, an accident occurred that ended with the death of a popular singer. The data released led to reidentifying the person responsible for the car crash and left them as a vulnerable target of public retaliation.
When publishing data, Argentina is now looking at further aspects of privacy protection laws that prevent sharing personal data that could lead to stigmatizing individuals.
So we need to go beyond anonymisation and consider the context relating to how data is being released to avoid individuals getting harmed, and look at ways to be transparent and maximize the utility of open data in an appropriate way.
Two approaches to support agencies
New Zealand is working on solutions to two privacy-related challenges:
- Tools: How to efficiently publish granular, useful data while still protecting privacy (a machine-learning approach)
- Governance: How to support government agencies to confidently make good decisions on releasing data when challenged about privacy by the people affected
Data Confidentiality as a Service
When it comes to releasing granular data, the benefit is the ability to gain deeper insights. However, the risk of identifying an individual is much higher. The time it takes someone to process the data can be significant, iterating towards that line between protecting privacy and having a useful dataset at the end. A machine-learning approach has proven to significantly reduce the time and effort to confidentialise data.
Through a pilot delivering data confidentiality as a service on personal accident data using the software tool, New Zealand have learnt that even when a dataset has been confidentialised it may not be enough. Like they debated in Argentina, it depends on the context and what information is in the public domain that could lead to a high profile individual still being identified — and its what is then learnt about that person that may be sensitive. Work is ongoing as data has not yet been released. More learnings from this pilot along with a hypothetical scenario to consider will be published soon.
Supporting the right decisions on privacy
As in Ukraine, and everywhere, public servants are very fearful of getting it wrong when it comes to protecting privacy. This leads to either total avoidance of releasing data or backing down very quickly if anyone challenges the release of data based on privacy concerns.
An example in New Zealand is where a number of agencies have been challenged about intentions to release data about the location of hazardous substances (like asbestos in buildings or poisons in the ground) or biosecurity threats (noxious weeds or plant diseases).
In New Zealand property ownership is open. Therefore by correlating the location of hazardous substances or biosecurity threats with property data, people can find out if these things are on their land. Some property owners see this as private information as it would directly affect the real estate market, and claim it should not be made public. Others think the public good overrides the right to privacy and the locations should be made open. They question what’s more important, people’s health and safety or the right to privacy?
To help address this dilemma, the plan is to collaborate with some internal legal advisors from different agencies looking at what aspects of the law may override others and to develop a resource that works through decisions like this weighing if data should be released, or not. This will then be shared with the Privacy Commission for their view.
The aim is to publish this resource so that when challenged, agencies can more confidently make the right (and consistent) decision, and be able to point to the reasons why.
We hope that by sharing challenges and solutions about how to open data responsibly, we can start to inform practical approaches to strengthening our data rights in a fracturing digital age. If you want to join the next meeting of the Implementation Working Group (on 27 August) to delve deeper into privacy issues focussing on methods used to confidentialise/anonymise data with two guest experts please write to email@example.com.
We would like to thank the participants of the IWG and Charter adopters, for sharing the stories that informed this blog.