Why every data scientist should care about AI ethics — and where to start

Leo Dray
neoxia
Published in
5 min readNov 24, 2022

While Artificial Intelligence has already proven to be highly useful in many fields and is increasingly being used as a tool to replace human workflows, AI engineers seem to be more fascinated by their achievements than concerned by the ethical risks of the models they build. As a data scientist, I faithfully think AI offers many opportunities to solve some of the world’s major issues, as long as industry players are willing to get involved in the ethical debates their work raises. Hence, I am convinced every data scientist should care about the topicality of research on AI ethics.

AI Raises Ethical Concerns Data Scientists Should Care About

The way current machine learning models are developed highlights several concerns regarding the rise of Artificial Intelligence.

First of all, the fact that those models are trained on growingly huge amounts of data questions the way the training datasets are produced. In fact, many case studies have shown how unbalanced composition of datasets could introduce sociological bias into the predictions of the models, particularly in terms of gender and racial bias. To illustrate how impacting those biases can be, let’s take the example of a Google computer vision service that produced different results depending on skin color of a given image. In an experiment published on Twitter, the non-profit research and advocacy organization AlgorithmWatch showed that Google Vision Cloud labeled an image of a dark-skinned person holding a thermometer « gun », while the same object in the hand of a light-skinned individual was labeled « electronic device ». The company apologized and apparently fixed this issue, but the problem is likely much broader. It points out the necessity of transparency on the composition of models. To this effect, the Big Science Project has shown great initiative in publishing the dataset on which the BLOOM language super-model was trained. This should become common practice in AI — and this article will highlight this further on.

Just as with dataset composition, where creating ethical models requires total transparency, the same is true for the models themselves

The decision-making processes of machine learning models are often referred to as black boxes, and to say the least, it is true that we lack understanding of many models currently used in production by some industry leaders. Of course, those models can be characterized by billions of parameters, and to try and explain such models is not an easy task. But it seems essential to me that we find ways to understand how AI models work, and have the ability to clearly explain them, especially when regulatory compliance calls for it or when this understanding can help assess fairness. The risk also lies in the increasing use of Machine Learning algorithms in predicting multiple facets of life : not understanding an AI that predicts whether it is going to rain tomorrow is not such a thorny issue, but when it comes to deciding if you should get a loan, if you are fit for a job or eligible for an insurance, it is obviously a lot more sensitive, and can lead to what some call algorithmic prisons.

Machine Learning Engineers should be the first ones to be aware of those issues. But currently, do engineers really care about AI ethical issues ? Most of all, do they know about existing debates and seek to educate themselves on the topic?

Towards New Approaches To Building Tomorrow’s AI

Firstly, let’s give a synthetical overview of existing resources and topical concerns. It is very difficult to be exhaustive when it comes to listing the different works and institutions that have contributed to settle ethical fundamentals, as the latter are various and scattered. However, the Recommendation on the Ethics of Artificial Intelligence adopted by UNESCO’s General Conference on 23 November 2021 provides a solid basis. Elaborated after a two-year process, it is the first global standard-setting instrument on the ethics of artificial intelligence. The main concerns on which the report’s principles are based on are the following :

  • Implications of AI in terms of Privacy and Security
  • Reliability and Safety
  • Fairness and inclusivity
  • Transparency and accountability

But to know about those basic principles is not sufficient. It is really essential for AI Engineers to follow topical debates, and to keep an eye on ongoing research and initiatives.

To go further and explore in depth a wide range of topical AI ethics’ problematics, one could consult the State of AI Ethics Reports published on a regular basis by the Montreal AI Ethics Institute. By the way, you may also find their free weekly newsletter, the Brief, an interesting read. I would also like to mention the very interesting works of Giada Pistilli, Principal Ethicist at Hugging Face — let’s quote the ethical charter she wrote for the Big Science project — and the various hopeful initiatives of this French company — notably BLOOM, the world’s largest open multilingual language model. During one year, more than 1000 researchers from 60 different countries and more than 250 institutions worked together on this very large neural network language model, and on a 28 petaflops multilingual text dataset. The participants strived to investigate the dataset and the model from all angles, including bias, social impact, ethics, and even carbon impact. All the knowledge and information gathered during this workshop is openly accessible and can be freely explored. This approach to building AI models should be an example for every data scientist.

Conclusion

Many other contributors and initiatives could have been mentioned, but here are some easily accessible and fully reliable resources to start with. To conclude, I hope this article convinced you of the necessity to care about the Ethics of AI, and if you are a data scientist, that it provided you with some basis to engage in the topic. By considering ethical issues, by changing our approaches and ways of thinking about Machine Learning, we could aim at a better and more transparent AI and stop dragging the unfair and biased reality into our future… Data Scientists, Unite !

--

--