Source: unplash.com

The Hero’s Dark Side

Why Data Science needs to face its demons to begin to deliver on its promise to save the world

Racchit Thapliyal
Published in
6 min readNov 4, 2020

--

By Racchit Thapliyal, Casey Berman, Ekaterina Lyapustina and Thomas Tran

Introduction

Data science is an amazing practice that enables businesses to analyze millions of bytes of information to identify meaningful insights and optimize operations; it also enables scale like never before through automation and removing manual effort. However, emerging now is also a nuanced, “dark side” to data science. Whether intended or unintended, data-science models are at risk of potentially violating several privacy rights, including perpetuating systematic biases, and automated decision-making that could inflict harm to one’s personal & financial well-being. Besides violating consumer rights and trust, this can have a negative impact organizational brand and reputation as well.

The purpose of this article is to provide a perspective on data science vis-a-vis its ethical considerations. With great data comes even greater responsibility!

Data Science — A Quick Primer

When computers were “born” in the middle of the last century, so too was the concept of data. Years before the internet grew in usage, scientists were looking for ways to collect, analyze, and model the data they gathered through the aid of computers to guide the framing of hypotheses and subsequent tests. With the growth of online activity, there was an explosion of data. The methods previously used by scientists were adopted by technology giants to propose and test hypotheses about user characteristics and behavior. Thus was born Data Science.

Data science soon became powerful, “sexy,” and the (ostensible) panacea to all of the world’s problems. In one form or another, data science began to power healthcare records, internet search engines, targeted advertising, website recommendations, weather predictions, consumer buying preferences, advanced image and speech recognition, airline route planning, and many other activities we now take for granted. The Alexa-powered Echo sitting on your kitchen counter is the quintessential example of data science in action.

A 360-degree view of data science, from the perspective of our discussion of ethical considerations, can be forged out of three main ingredients:

1. The Data and its Source: This includes where the data came from, how it was collected and stored, and what controls and privacy checks are currently in place.

2. The Data Scientist: This focuses on the name and contact details of the collector and controller of the data, what agendas, desires, or aspirations drove this data to be collected in the first place, and confirming how the “right” analysis is determined.

3. The Data Consumer: (1) who is the intended audience (such as the media, the general public, the ordinary citizen, or a specific subset of individuals) and (2) how the data, and its corresponding analysis, will impact this group of people.

Source: unsplash.com

Data Science in the COVID and Post-COVID World

Data science’s influence has only increased since the onset of the COVID-19 pandemic: It drives the estimates of demands on local health systems, shows up-to-date results of confirmed cases, fatalities, and recoveries, determines the efficiency of contract tracing, measures the spread of the virus, and even assists in finding cures faster. It has emerged as the key component in providing a “more accurate picture” of how many people have COVID-19 and also help in planning how to best look after them.

But every hero has its Kryptonite. Data science is no exception. The rapid expansion in monitoring and disease “surveillance” has exposed many cracks the in iron-clad reputation of Data Science.

Gaps and Cracks in the Data Science Veneer: The Need for Ethical Guardrails

We have identified six core issues with the current practice of Data Science that pose serious risk of ethical transgressions or privacy breaches:

  • Human error: Data Science and its underlying algorithms are not objective — humans drive them, and they can thus be manipulated.
  • It’s all relative: Data science can be a double-edged sword — its importance and impact are often found (and battled) in the interpretation of its results, which can vary considerably. The idea that data (and data science) is context and belief independent is erroneous.
  • Propagating biases: Data Science can reinforce and propagate prejudice (by use of biased data and analyses), if one is not careful.
  • Profiling: Data science’s powerful predictive capabilities can be used as a profiling tool to triangulate an individual’s highly sensitive financial status, health, personal interests, and location/movements.
  • Data misuse: Combining & processing datasets in ways that exceed reasonable expectations, or the original scope of data collection, is quite common.
  • Vulnerable populations: Currently, there aren’t widespread controls to guide the handling and processing of data generated by potentially vulnerable subjects, such as children and the elderly.

These are not theoretical dangers, they are unfolding now:

Police may rely on “fundamentally flawed,” racially-biased, and discriminatory artificial intelligence such as ‘PredPol’ to predict crime. Organizations can implement differential pricing, which favors some populations over others. An individual’s historical “data footprint” can negatively impact their employment status or ability to get hired. Government monitoring can begin to appear Orwellian. Targeted marketing can result in adverse psychological impacts on children. Exposure of one’s personal information can result in drastic professional and social consequences.

Thankfully, we are seeing a movement toward a more thoughtful and conscious approach to data science. Naysayers may feel that data cannot discriminate; that ethical guardrails will stifle innovation, which will hurt global progress. Or that limiting Data Science will hamper commerce and the economic growth we so desperately need. These are certainly valid concerns, but nothing bars them from being managed within a new data science paradigm that recognizes that ethical gaps can lead to real dangers, bias, and discrimination.

Ethical Guardrails: Deep Dive into Privacy

As data science becomes an ever-larger component of enterprise reporting and strategic decision-making, ethical and privacy concerns become even more important. Understanding and implementing data science with ethics & privacy laws as guardrails leads to safer handling of private information. The protections are also important to protect an enterprise’s brand & reputation from the fallout of a potential breach, and can actually serve as a means to gain the trust of customers.

Consider any business that uses personal information to inform product strategy, branding, and targeted marketing. Breaking a privacy law results in large fines for the organization that misuses data, requires organizational restructuring, absorbs resource capacity, and usually results in damage to the organization’s brand & reputation. Therefore, it is imperative that data scientists understand their organizations’ privacy vision in order to carry out their roles responsibly.

Proposed Solutions

At Slalom, we are day-to-day practitioners of the craft and have encountered many of the challenges we’ve delineated above. We’ve also surfaced numerous workarounds and solutions that have enabled our clients to better understand data science, its applications, and power, and find a thriving middle ground that unleashes the power of data science. All while also falling comfortably within a safe, ethical sandbox.

Here’s how we help our customers:

  1. Slalom can help you understand what privacy compliance requirements your organization must follow in accordance with your industry and the locations of your data processing sites.
  2. We can drive privacy impact assessments on new technologies, systems, and products your organization plans to launch. These can be simple and will save you time and resources in the long-run.
  3. We can help you begin taking steps to implement Privacy by Design throughout your organization.
  4. Slalom can assess your organization’s current data culture capabilities and begin developing a Modern Culture of Data by leveraging Slalom’s expertise and frameworks.
  5. We can help you gain clarity around your third party service providers, data processors, and partners’ relationship with your organization’s data, and find opportunities to collaborate on enhancing your privacy capabilities. Check out Slalom’s Moonshot website to explore how we approach strong alliances with data partners.

Conclusion

Data science is a discipline that will continue to evolve and change the way enterprises engage with the world. Privacy is a global issue that touches all individuals, and therefore organizations must adopt a culture of doing the right thing, especially when it comes to PII. Ultimately, data enables businesses to achieve spectacular results, but this power comes with a new-found responsibility that should be taken on by all employees of the organization. Culture is at the heart of what drives change and the ongoing commitment to use data responsibly. What is your organization doing today to spark that shift?

Racchit Thapliyal is a Principal Consultant specializing in AI/ML and Data Science in Slalom’s Data and Analytics Practice.

Casey Berman, Ekaterina Lyapustina and Thomas Tran are Consultants in Slalom’s Business Advisory Services Practice, and are part of Slalom’s Privacy Center of Excellence.

--

--

Racchit Thapliyal

Interests: Tech | Data Sci, AI, ML | Biotech | Mgmt, Econ, Soc Sci, Psych | Ed, EdTech | Digital Humanities, Culture, Media | Classics, Indology, Phil, Lit