Ethical Artificial Intelligence Frameworks — Privacy

Published in

Brass For Brain

7 min readSep 16, 2023

Google just recently amended its privacy policy, announcing that Google will collect (or has already been doing it) all public data to train its AI systems. Privacy is a paramount consideration in the realm of artificial intelligence, emphasising the utmost importance of respecting and safeguarding individual privacy. This principle holds such significance that numerous regulations (i.e.: GDPR, CPPA & PIPEDA) have been established worldwide to uphold it. In today’s era, data has become the new valuable resource, akin to oil. Governments and companies extensively gather personal information, often for the purpose of sharing or selling it to third parties or utilising it for diverse objectives, including marketing endeavours.

The Toronto Declaration promotes that states are obligated to comply with applicable national and international laws and regulations that establish and enforce human rights responsibilities, specifically those aimed at preventing discrimination and similar violations. This includes adherence to laws concerning data protection and privacy, which play a crucial role in safeguarding individual rights.

Moreover, Aislomar AI principles promotes two principles:

12) Personal Privacy: People should have the right to access, manage and control the data they generate, given AI systems’ power to analyze and utilize that data.
13) Liberty and Privacy: The application of AI to personal data must not unreasonably curtail people’s real or perceived liberty.

Montreal Declaration for a responsible development of artificial intelligence (MDRDAI) promotes the protection of privacy and intimacy as well. Specifically, the protection of personal spaces from surveillance and data acquisition systems is essential to safeguard individuals’ privacy. According to MDRDAI, AI systems (AIS) and data acquisition and archiving systems (DAAS) should not intrude upon intimate thoughts and emotions, avoiding moral judgements or harm. People should have the right to disconnect digitally in their private lives, with AIS providing the option to disconnect periodically. Individuals must have control over their preference information, with AIS requiring free and informed consent for the creation of individual preference profiles. DAAS should ensure data confidentiality and personal profile anonymity. Personal data should be under individuals’ extensive control, and access to AIS should not require relinquishing control or ownership of personal data. Lastly, individuals should have the freedom to donate their personal data to research organisations while ensuring the integrity of their personal identity is maintained, preventing malicious use of AIS to manipulate or damage reputations.

In the realm of artificial intelligence, data is not merely a desirable asset but an indispensable one. The abundance of data plays a vital role for developers and designers, as artificial intelligence heavily relies on statistical analysis where input is as crucial as output. Consequently, companies become voracious for data and may go to great lengths to gather various types of information from individuals, often without their awareness. Once acquired, this data is utilised for training, testing, and validating purposes, enabling the creation of diverse algorithmic models that serve various objectives. With knowledge of an individual’s preferences, allergies, health history, and eating habits, companies can employ targeted strategies through personalised marketing.

Source of privacy risk

private data: Privacy is the ability for individuals or groups to choose what information about themselves to share, asserting ownership and control over that information. It is highly valued in western cultures and considered a human right. In the context of technology, privacy mainly concerns personal data, encompassing various forms and storage methods. Data can be kept private through selective disclosure, encryption, anonymization, or limited access. However, emerging technologies like AI pose challenges to privacy due to their ability to process vast amounts of data, make novel discoveries, and infer sensitive information.
secondary use: when data is supplied by customers, users, or third-party organisations, there is typically a specific purpose for its use. This primary use aligns with the intended use of the data as stipulated by the supplier or outlined by the data collector or handler. However, there may be situations where using data for secondary purposes can be a viable business strategy, as long as it remains aligned with the original intent. It is important to recognise that secondary use of data comes with risks, as it expands the scope of potential ethical issues and introduces new entities, locations, networks, processing techniques, and potential privacy concerns. Therefore, a reassessment of the risk management standpoint is necessary when considering secondary uses of data.
first party data VS third party data: data can be categorised as first-party data, collected directly from users, or third-party data, obtained from sources without a direct relationship with the organisation. First-party data is valuable and provides insights into user preferences and behavior, but users may not always be fully aware of how their data is used. Third-party data is often statistical in nature and comes from various sources, making it less valued and more prone to privacy risks. Organisations must be cautious with third-party data as its origin, accuracy, and user consent may be uncertain, requiring extra scrutiny and risk management.

Now the question arises: “Why is protection necessary?” To illustrate the impact and significance of safeguarding personal information, let’s examine the Tim Horton's case in Canada. In May 2019, Tim Horton's initiated the collection of geolocation data from users who had downloaded their smartphone application. According to the FAQ, the collection was purportedly functional only when the application was opened. However, it was discovered that the user’s geolocation was being tracked even when the application was closed, allowing the app to deduce personal details such as the user’s home, workplace, school, preferred stores, clinics, and even vacation destinations.

This brings to mind the infamous Target case, where buying patterns derived from rebate coupons enabled the retailer to predict a teenage girl’s pregnancy before her own father was aware. Now, envision a scenario where Tim Horton's falls victim to a hack. Despite assurances of personal data deletion, the ease with which personal information is traded on the dark web raises concerns about potential criminal exploitation, including ransom/threat schemes and fraud through social engineering. After all, the Tim Horton's example serves as a compelling reminder of the importance of protecting personal information for the welfare of the general public and consumers alike.

In the realm of generative AI systems, the matter of data acquisition emerges as a pressing concern. To illustrate, discerning the precise nature of the datasets amassed by OpenAI for the training of ChatGPT proves to be a formidable challenge. This opacity in data sourcing raises noteworthy questions. Additionally, the apprehension extends to instances where users may inadvertently divulge confidential or personally identifiable information while interacting with ChatGPT. This potential vulnerability underscores the significance of fortifying data privacy measures. Lastly, it merits acknowledgment that skilled practitioners possess the capability to extract personal insights by adeptly framing inquiries when employing generative AI. This underscores the imperative for maintaining vigilance in terms of information security and ethical usage. Unfortunately, OpenAI and Microsoft have recently been hit with a second class-action lawsuit for allegedly failing to meet various privacy regulations.

In Quebec, there was a case where Commission d’accès à l’information (CAI) investigated the Val-des-Cerfs school board’s use of an algorithm developed with a consulting firm to identify Grade 6 students at risk of dropping out. The school board used a machine learning method to analyze over 300 types of student data and generate predictive indicators of dropout risk (“Tool”). The CAI’s decision determined whether the school board complied with the Access Act in collecting and using personal information during the project’s development phase. The CAI found that the personal information used in the Tool, although depersonalized, could still be identified, as it was not irreversibly anonymized. Additionally, the CAI classified the Tool as artificial intelligence, capable of predictive analysis through algorithms. Moreover, the Tool generated new personal information in the form of dropout risk indicators, which the CAI deemed as a collection of personal information.

Mitigating measures (industry best practice)

Other than complying with applicable privacy regulations, below are some industry best practices.

reduce private data sharing: data sharing is a common practice in business, but it comes with risks to privacy. Tracking data as it moves between organisations can be challenging, and consent agreements may vary. Privacy protections may be lost when data is shared, and the potential for unauthorised access or misuse increases with infinite copying. To mitigate these risks, minimising the amount of data shared is a good strategy. Organisations can be selective about sharing certain types of sensitive data and implement techniques like differential privacy or split neural networks to protect data. However, these methods may introduce new challenges and ethical risks.
choice to the user: users should be given a clear choice and be well-informed when it comes to their data collection, processing, and sharing. Techniques such as opt-in and opt-out systems can allow users to consent or withhold consent for data collection. Providing granular options for users to choose specific tasks or purposes for data usage is better than blanket prompts. User Inspection Interfaces can show the flow of data and help users understand how their data is being used, allowing them to make informed decisions. By implementing these techniques, the risks of mishandling private data can be reduced.
minimise data collection: minimising the amount of user data collected can reduce the risks associated with data handling. Not collecting data altogether eliminates the possibility of accidental data leaks. However, for organisations relying on data for their success, certain data may still be necessary. It is important to avoid collecting personally identifiable information (PII) that can potentially violate users’ privacy if exposed. Assessing the need for collecting sensitive PII and considering not collecting metadata can further mitigate privacy risks. By being selective in data collection, organisations can minimise risk while providing value to their business.

There are also other solutions (i.e.: Private AI) that uses AI systems to provide services which can create synthetic data or redact personal identifiable information!

To the readers: do you agree with the current privacy regulatory regime(i.e.: GDPR, CPPA, PIPEDA, etc.)? Is it realistic to get consent of billion people to collect their data that are publicly available? What do you recommend to reduce privacy risk? Is privacy right the opposite of having a robust AI system? Is machine learning compliant with the right to be forgotten? What is your definition of anonymity and compare it with GDPR’s definition.

Ethical Artificial Intelligence Frameworks — Privacy

Written by Law and Ethics in Tech