GDPR IS RAGING HOT: EXPLAINABILITY & ETHICS IN ML/AI
The following writeup is the second of few based on my learnings about the impact of Machine Learning / AI on business strategy from MIT Sloan School of Management & MIT Computer Science and Artificial Intelligence Laboratory (CSAIL). The first article on how to get started with Enterprise AI was widely reviewed and shared, which encourages me to dive into a difficult topic of ethics.This helps me validate my understanding of the subject and its impact. Hope this is useful for senior execs starting to think of implementing AI in their business processes.
SETTING THE CONTEXT
As I write this, the CEO of Facebook, Mark Zuckerberg is finishing 2 days of grueling Q&A with about 100 US lawmakers, on Facebook’s data collection and sharing practices and how that had influenced the election outcomes of world’s oldest democracy and reduced the trust between Facebook and its members. However I try to slice it, the importance of harvesting and synthesizing data in todays business will always be understated. Facebook, Amazon, Microsoft, Google, and Alibaba (FAMGA) — all offer their free services in exchange for your data. Going by the trend, I can surely predict that in the near future, there will be more companies collecting our data and offering us services because they would find a business model around how to meaningfully use our data to serve us better. We expect those companies to be ethical in their treatment of our data and the insights they draw from them. Hence regulations (laws) around data collection and usage is an ardent necessity.
Debates are still on whether Facebook’s CEO need to be criticized or is it just the way of doing business where distributed trust is hard to control. So Facebook may be still on the fence about what kind of privacy policies and ethical practices it will put in place for its American population, there is no doubt in anyone’s mind that Facebook and other members of FAMGA will do everything that is needed to comply with GDPR for EU countries and their citizens.
WHAT IS GDPR?
General Data Protection Regulation (GDPR) is a EU specific regulation focused on data protection and privacy for all individuals in the EU. It is set to kick-in on 25 May, 2018.
Organizations are creating special funds to accommodate for GDPR compliance, as the fines for non-compliance could be as high as 4% of annual revenue or $21 million, whichever is higher. Basically this means that small companies could go out of business with a $21 million fine, and for a company with revenue of $10 billion, the fine could be a staggering $400 million.
No wonder, this week, I and many others have received an email from Google Analytics on all the controls and permission settings that they are changing to become GDPR compliant — no one wants to be fined. What am I surprised is that there are not many such emails hitting our inbox yet.
WHAT IS INCLUDED IN GDPR?
Based on the Facebook — Cambridge Analytica episode, GDPR seems like the most apt regulatory response to big-tech data-monopolies treating consumer data unethically and benefitting from it without any remorse. To be honest, GDPR will provide all EU businesses a clear legal framework around right data usage.
There are few major items in the regulation that is worth mentioning.
- The definition of what is considered as personal data has been expanded significantly, and now online identifiers such as IP addresses qualify as personal data. Additional data, including economic, cultural and health info, is considered as personally identifiable information.
- Controllers (those who state how and why personal data is processed) must ensure that personal data is processed lawfully, transparently, and with clearly stated purpose. Once the purpose is fulfilled, it must now be deleted.
- The controllers need to keep a record of how and when the individual provided consent, allow withdrawal of said consent at any time, and permit access to the data at “reasonable intervals”.
- The controllers must also describe what is occurring with regards to data in plain language so that an understanding is accessible to everyone.
- Further, EU citizens can now request for correction of data if its found to be incomplete or incorrect and have their data deleted (right to be forgotten) if they believe it is no longer necessary or being used for different purposes for what it was collected.
WHAT WILL BE THE IMPACT?
“If you can’t explain it simply, you don’t understand it well enough.”
Post GDPR, its strongly believed that buying and selling of third party data will become different and advertisers will be forced to look inward to foster their first-party relationships.
GDPR will empower data controllers to have more specific agreements upfront with their supply chain partners, including legal clauses to ensure data protection agreements, following a mandated requirement for processors to help their controllers fulfill data subject requests and cooperate in the case of a breach.
GDPR mandates a “right to explanation” from machine learning models — meaning that those significantly affected by such models are now allowed to ask for an explanation of how the model reached its decision — say to give or not give loan to a person. While this “explainability” is definitely beneficial to the end consumer to grasp the ethical implications of sharing their data into ML systems, its going to be really hard for the ML gurus (and companies who hire them) who now have to make sure to publish the following if they are ready to use their algorithm on EU citizens -
a. enough technical details around the model selection and training process including the origination and type of the data set used for training.
b. understand the importance of the model in public deployment — what will be the impact of false positives and false negatives and document them and
c. set up systems to educate the data-subject on why not to opt-out of the model- prediction process.
Say, you are creating bespoke offerings for your customer segments by employing data science algorithms, make sure you invest in intelligent logging that explains the automated decision making process and how you arrived at probability scores and what factors were taken or not taken into consideration and why.
If your organization is collecting data about EU citizens, you need to be prepared for GDPR as it defines and strengthens data protection for consumers and harmonizes data security rules within the EU.
You need to invest on controls on data processing and consumer profiling and also figure out ways to reasonably explain automated decisions that affect individuals in the EU.
Proactive way to get started on this is to get an data ethicist in your team.
Say you are building predictive analytics for real time media buying, you now need a corporate ethicist to work in your team who can review the processes (for data collection, training and prediction) and assure that these automated ads buying and selling processes are not biased for or against a particular segment on company unfavorably over others.
For if its the other way, you may end up paying more than what you will pay for an ethicist to start with — worse you could lose your entire business in EU.
If you like what you read and want to use this content for any presentation or business case or anything that makes sense for you, please let me know how you plan to use it. Open to listening to critical comments and constructive suggestions.