The impact of data & algorithms — the law of unintended consequences
“After multiple months of trying to find a new position, and applying to an average of 5 positions per week. Maria was totally defeated. She did not understand — she had the qualification, she had the experience, her resume was written by a professional and she was submitting customized covers letters to each position, and not one single call back, until that determining call from a recruiter — where she asked — which Maria are you ? and that is when she realized how many Maria’s are there ? ”
The problem with multiple data breaches and compromised identity data free flowing within the vast digital ecosystem has created a data provenance problem; one that will be have a greater impact on individuals as such data makes its way into the data supply chain (almost like laundering money and cleaning it via legal channels) . When the data enters the regular supply chain that is used by a conglomerate of organizations to give it validity — it will then be consumed by algorithms to make data driven decisions — which will have a ripple effect on unbeknownst parties and actors.
The age of AI, ML, smart algorithms is here — specially in some key areas around finance, recruiting , healthcare — the issue is that those algorithms are being deployed in mass in an effort to streamline and improve certain process, making obfuscated decisions that have a long term impact on the same individuals which data was compromised in the first place.
For example ; in the area of recruiting algorithms are used to determine the score of the candidate, utilizing data from many sources including Equifax .
But what if the data that the algorithm is using has been contaminated ? what if the underlying data point that drives a large weight is not accurate ? what is the impact of that decision ?
Maria represent the millions of individuals whose personal data has been compromised by the many breaches and now is making its way back through the data supply chain to those same organizations but its context has been altered.
For example — how many Maria’s exist — how do you determine which is the real Maria if multiple identities have been created for the same person, all using the same relationship information with a slight change of an address ?
The data supply chain is an interwoven process of originators, suppliers, aggregators and consumers, where provenance is difficult to ascertain and validate.
The issue is that in the coming years, the impact of the on going data breaches will be felt by everyone the tries to open up a bank account, find a new job, request a new service, etc.
Algorithms will make obscure decisions based on compromised data points affecting the lives of individuals that will have little recourse or awareness of how those decisions have been reached.
It is then — that the full effect of the data breaches will be felt and understood by the many individuals that had their private data captured, aggregated and compromised with total disregard of the impact that such data will have in the future.
AI driven organizations, must understand the provenance of the training data and the decision data that they are employing, they must audit it and validated, otherwise they could become liable for decisions that they do not understand based on data that they do not control.
Data mastery is no longer an afterthought, but a core component of the organizations and its impact will be felt for years to come.