Does a Data Scientist Have to Be Ethical?

Samuel Taiwo
The Startup
Published in
6 min readSep 5, 2020
Photo by Nathan Dumlao on Unsplash

Living in an age where big data has become an asset (also refereed to an organization’s unrefined gold) to organization and individuals. Data science has been a hot topic amidst organization’s with the aim of collecting meaningful data to enhance business growth. Not until 2010, organization’s focus was building an infrastructure that can process, store and access data to make sense of consumer data, analyzing them and making decisions based on these data and also to gain business insight.

Due to the great impact data can have on an organization, and considering the rapid advancement in technology, companies processing consumer data has leveraged the use of enhanced software such as Hadoop framework, Business Intelligence software and the use of Artificial Neural Networks and Machine Learning Algorithms to process and understand data. To be able to use these software efficiently, organization needs to employ a data scientist who has a solid understanding on how to analyze data with these software and gain maximum insight to make algorithmic based decision.

When analyzing these data, data scientist are faced with some ethical challenges such as data collection bias, algorithmic bias, explain-ability results, privacy and so on. In order to formulate and produce morally good solutions: which involves parsing various steps such as generation, creating, processing, discrimination and algorithm. To build such an ethical system, there are four guidelines to consider :

  1. Do good
  2. Minimize harm
  3. Just and fair
  4. Respect privacy

** The word Data Scientist and Data Practitioner are used interchangeably.

Who is a Data Scientist?

Before we go down the road of who a data scientist is, lets consider what data science is. Data science is a vast field with a blend of mathematics, statistics, programming, computer science and so on. It brings in scientific method, process and algorithms to extract insight from both structured and unstructured data. The term could be traced back to 1974 when Peter Naur proposed it as an alternative name for Computer Science. However, the professional term “Data Science” has been attributed to Dj Patil and Jeff Hammrbocher. Till date, there is still no consensus among scientist on the definition of data science and some still consider it a “buzzword”.

On the other hand, a data scientist is someone who harness and process huge volume of data to generate, extract insight, interpret data effectively and capable of presenting results in a non-technical term. Also, a data scientist is someone who is able to collect a large amount of data (usually consumer data collected or stored by an organization) and gain meaningful insight by working with several elements related to mathematics, statistics, computer science using analytical techniques such as Machine Learning techniques and BI software’s.

You might be wondering why ethics and laws in relation to the advancement of AI, well we can say because laws cannot move faster than technology and innovations, so rather than waiting for the laws to catch up with them, we work with the tiny bit of technological innovations and problems we encountering right now to mitigate future impacts. Laws and ethics are not meant to make AI feel constrained but to be more innovative and creative which will help in getting prepared for the unknown and being able to do good.

How Far are we to Building an Ethical Environment?

If you’ve ever googled what ethics means you’d see a lot of definition pop-up but they all digest to ethics being concern of human well-being: about the well-being of others.

Data ethics on the other hand is an new branch of ethics that study’s and evaluate moral problems related to data in order to formulate and support morally good solutions. When it comes to in-questing, accessing and understanding previously unknown human/consumer behavior, data plays an important role. Because of these values data has brought a competitive marketing strategy to the work force.

With great power comes great responsibility

However, with these great opportunities comes some ethical and moral challenges/problems faced by data practitioners when dealing with consumers’ data. Data has brought a competitive impact to the market and has enhanced the development of intelligent products and services. However, they are some ethical challenges which has posed as a threat to human privacy with the use of AI for intelligent product and services; the human privacy is very important.

During the last few years we’ve seen various examples of data breach and the use of consumer data without consent to develop advance AI products. A popular and recent example is a tech company called Clearview AI. This company devised a groundbreaking facial recognition app that can take the picture of a person, upload it and get to see public photos of that person, along with the links to where those photos appeared. The system whose backbone is a database of more than three billion of images that Clearview claims to have scrapped from Facebook, YouTube, Venmo and millions of other websites — New York Times, Jan. 18, 2020. This software is great, it could help solve crimes such as shoplifting, identity theft, murder and child sexual exploitation cases and so on, but all these at the expense of corroding privacy.

Big tech companies such as google refrained from doing such in 2018: when the company put the kibosh on the Project Maven (awarded by the US Pentagon). After the contract expired (the company said the project was too unethical and about 12 employees left google because of the unethical project). The aim of the project was to support the advance development of human-identifying drone technology by analyzing drone footage using AI trained on billions of data sets derived through the company’s other product (Not long enough a company named Palantir took over the project). Another recent example is the Cambridge Analytica.

With all the above examples, we can see that the future is bright for AI whilst considering the ethical and moral section of these advancement.

Dr. Ewa Luger (Chancellor’s Fellow, Digital Arts and Humanities. University of Edinburgh.) said the most ethical and recurrent problem faced by a data scientist are:

  1. Algorithm
  2. Prejudice/Bias
  3. Explain-ability AI (XAI)
  4. Privacy

What should a Data Practitioner look at to inspire him/her to work ethically?

Has every revolution has it good and bad side, the data revolution will inflict harm in way intended or not intended to, just as the Clearview problem and so on. Not to exacerbate harm the data revolution will bring, it is important for data scientist to be ethical when handling consumer data.

How then can we make a data scientist more ethical, what could inspire a data scientist to do ethical work or what ethical/moral laws or rules have been laid down to inspire an ethical environment?

Till date, there hasn’t been a law or rule to inspire an ethical environment for data scientist. However, to inspire an ethical environment, Ben Olsen a Sr. Content Developer at Microsoft drafted a data oat referencing the Hippocratic oat. He proposed what a modern data oat might look like:

I, a Data Practitioner will promote the well-being of others and myself while striving to do no harm with data through:

a. Professional application of analytical technique

b. Humility in analytical claims

c. Anticipation of legal and regulatory scenarios

d. Transparency in computation and documentation

e. Fidelity to this oath beyond the bottom line.

Other ways a data practitioner could inspire an ethical environment will be asking him/herself critical questions when handling consumer data. These questions include but not limited to:

  1. Is the data bias in terms of gender, prejudice etc.?
  2. How much relative importance should be given to the data?
  3. Can the process of getting the result be explainable?
  4. Is the algorithm bias? What bases or intuition is the algorithm built on?
  5. What factors or features did my algorithm consider to get this conclusion?
  6. Will my result inflect harm or do good; how much weight should be given to any?
  7. What laws and regulatory should be considered when handling these data?
  8. What consumer right might I have impinge while handling the data?
  9. Because I have been given consent doesn’t fully mean I shouldn’t respect privacy?

These questions goes on and on which helps to create an ethical environment, as they say with great power comes great responsibility and being an ethical data practitioner will go a long way to paving the way for a safe and responsible social implication and integration of AI.

--

--