Two Contrasting Reasons why we need to Rethink Data Privacy

Lauren Toulson
CARRE4
Published in
8 min readMar 29, 2021
Photo by Dayne Topkin on Unsplash

With increasing infiltration of data-collecting technologies in our everyday lives, data privacy is a hot topic and one with lots of contrasts in approaches. This article weighs up the reasons why we don’t have enough privacy, as well as reasons to give up our privacy for the greater good. How can solutions for regulation navigate these issues, and where do we draw the line? These are some key questions that need to be considered by the industry.

GDPR could deter solutions to Automated Bias

It is well known by now that many AI are flawed by biases, significantly disadvantaging certain groups by gender, race, religion, class and other variables like health. In a recent paper (1), researchers explained that in order to identify where biases are emerging in their systems, they need to use the type of protected ‘special category’ data (like race) under GDPR. The complications of which could be a deterrent for ML programmers working at speed and not taking into account ways to measure the negative impacts of their system. Additionally, as Moerel states some see a taboo for collecting personal data such as these categories.

“Discrimination for the sake of fairness” they argue, means including personal data and classification of individuals in their data sets in order to find and design-out biases. Without such data, it is incredibly hard to identify and compare how the algorithm is discriminating and limiting opportunities of some groups. Many cases of biases, such as the COMPAS algorithm that profiles defendants which was found to have strong racial discrimination, are only found through real-life use where people are disadvantaged in the process. This is based on biased training data, where one cause of bias can be unequal representation of groups across the data set, such as having more men than women, more white faces than black. The model would learn statistical patterns that favour one group over another, and thus result in marginalising or discriminating against often under-represented groups. For instance with the COMPAS algorithm, the training data has a disproportionate amount of arrests of ethnic minorities, due to human bias, then further reproduces human bias in its output. “The AI issues of bias can worsen if it is not fuelled with quality data.” says Márcio Burity, Diplomat and speaker at the AI Summit.

These algorithms use an ‘anti-classification’ method that does not use GDPR protected data, yet other variables find ways to create the biases. The researcher’s proposal (1) is to use a classification method, using protected data points such as gender and race, which would allow for the identification and correcting model biases before the algorithm is implemented. In most cases of anti-classification models, because patterns are found on data not labelled with categories like gender, when biases are found it’s incredibly hard to pinpoint and fix them without further data collection.

Complementing the approach to design-out bias with personal data is the UK Equality Act 2010, offering protection from discrimination, be it human or AI. The Data Protection Impact Assessment (DPIA) goes hand-in-hand with this, requiring a risk assessment to be made of any use of data at large-scale with potential harm to individuals, such as discrimination through automated-decision making. The ICO states that the DPIA should always be carried out if its planned to use profiling, automated decision-making or special category data to help make decisions on someone’s access to a service, opportunity or benefit and to process personal data without providing a privacy notice directly to the individual in combination with any of the criteria in the European guidelines.

In a 2019 article by the ICO, the authors state that “If the organisation is replacing traditional decision-making systems with AI, they should consider running them concurrently for a period of time, and investigate any significant difference in the type of decisions (eg loan acceptance or rejection) for different protected groups between the two systems. … In some cases AI may actually provide an opportunity to uncover and address existing discrimination in traditional decision-making processes, and allow organisations to address any underlying discriminatory practices.”

In sum, those cutting corners, or just choosing to not include GDRP protected data for auditing processes, eventually must submit a DPIA and take steps to mitigate harm to individuals, like discrimination. By not using a classification method, the auditing process may be much more difficult. Rethinking how we work with privacy regulations will be an essential process for bias remediation in the future.

Photo by Marija Zaric on Unsplash

Reasons to share our data

Renowned sociologist Michel Foucault has commented on the risks involved in classifying individuals with biopolitical techniques similar to those described in the first part of this article. “In the context of healthcare and health insurance, risk-based and data-driven management techniques [of ordering, profiling and classification] that rely on practices of categorisation may lead to reinforcing further forms of inclusion and exclusion whereby some citizens are provided access to public and healthcare services while others are denied.”(2)

For data on our health, there have long been strict confidentiality rules for our patient and doctor data, and privacy regulations are catching up with new IOT including smart watches, biosensors and even our phones, that create insights about our health that are being created at a rapid rate. However, data-points on things like our step amount and frequency, heart beat, sleep patterns and menstrual cycle can reveal and predict underlying health problems which can be utilised to discriminate against in cases where we have chosen to share that data, and may result in things like increased insurance premiums.

However, the concept of the quantified self, as explored by Swan (3), is ever-more appealing as we are given insights about ourselves that can optimise our wellbeing, contribute to our sense of self, personal narrative and share this data with others online to create a sense of community and competition (in the case of FitBit’s step count).

Photo by National Cancer Institute on Unsplash

The Dilemma

Companies help us improve our health by giving us allowing us to monitor our activity, and even insights about our DNA health through services like 23andMe and Vitl. In turn, they provide the offer of keeping their health data ‘private’ or sharing it with the community and academic research. As Swan argued (3) in sharing our data with research, we have the opportunity to contribute our massive datasets of a scale often expensive to fund for research institutions, and thus contribute data from groups typically left out of health research: women (especially pregnant women), people of colour and those with disabilities. Additionally, data from wearables is valuable objective data in comparison to often self-reported data of scientific studies. This will massively benefit scientific knowledge of understudied groups.

In addition to this, it links back to the proposed idea at the start of the article. Sharing our personal data means designers of algorithms, and auditing algorithms, can identify where discrimination is occurring and adjust it to reduce inequalities. But in doing so, are we affecting our own autonomy and freedom from algorithmic discrimination?

One possible solution could be promoting users sharing of their data from wearables for research, and for designers taking steps to use protected data for auditing purposes.

Aidan Peppin from the Ada Lovelace Institute asks these questions in a recent article (4). “What is more important — improving health or data rights?” Most people, he argues, are happy to share their data when they know it is contributing to good. However, they also fear it may benefit powerful organisations that use it to the society’s expense, such as making profit — rather than advancing medicine or fighting climate change. He proposes that applying Ostrom’s data principles, including accountability, external oversight, and legislation like GDPR, means companies that fall out of line would be fined. This in turn would ensure data is being used for public good, and thus reinforce public support in sharing data.

ForHumanity believes that the governance of AI and Autonomous systems should include mandated, third-party, independent audits conducted annually to assure compliance with a set of transparent, crowd-sourced set of audit criteria. For Humanity’s Founding Director Ryan Carrier shared his thoughts on the topic of this article.

“I reckon that people need to pay a lot more attention to what data they provide where and are getting appropriate value for their service cost (the data cost). I do agree there can be a lot more safeguards — I for one provide none of that data”

Bias can never be completely removed, so mitigation is our goal. We will work to mitigate bias in the data, in algorithmic architecture, and in the disparate impact of outputs. Our most valuable tool to combat bias is Diverse Inputs and MultiStakeholder feedback loops embedded throughout the algorithmic process — not just at completion.”

Ryan is joining our panel discussion on 30th March, where you can sign up to hear more thoughts about AI Bias.

It is clear that much consideration is needed with regard to future governance and the use of personal data — and such a conversation must happen soon if we are to utilise real-time user data for public good as well as safeguard society against unfair discrimination by third-parties for the data they do decide to share.

This article was written in a series of articles covering our main topics of our upcoming AI Summit on 30th March 2021. Your can book your free ticket here.

Sources

(1) Hock et al. (2021) “Discrimination For The Sake Of Fairness — Fairness By Design And Its Legal Framework”

(2) Ajana, B. (2017) ‘Digital health and the biopolitics of the Quantified Self ’.

(3) Swan, M. (2013) ‘The Quantified self: Fundamental Disruption in Big Data Science and Biological Discovery’

(4) Peppin, A. (2020) “Doing good with data: what does good look like when it comes to data stewardship?”

--

--

Lauren Toulson
CARRE4
Writer for

Studying Digital Culture, Lauren is an MSc student at LSE and writes about Big Data and AI for Digital Bucket Company. Tweet her @itslaurensdata