A matter of data management: avoiding bias while democratizing AI

Published in

Innovation at Scale

4 min readNov 22, 2021

There are many issues around the introduction of artificial intelligence (AI), many of them ethical. They include the importance of avoiding bias, and understanding what we are building, so that we do not create unaccountable ‘black boxes’. However, for me, one of the biggest questions is about the impact of democratization on developing ethical AI. In other words, what happens when we take the development process out of the hands of data scientists and give it to business users instead?

There is no question that new ‘drag and drop’ tools enable business users to handle, visualize and manipulate data more easily. There is also no question that businesses need to become more data-driven — and with the shortage of data science skills, democratization must be at least part of the answer. However, there are many tales of problems created by well-meaning but ill-informed citizen data scientists.

Getting it right from the start

Marcin Ramotowski, based in Poland, is Head of Data Solutions at consultancy firm JT Weston responsible for R&D of Neula platform. He notes that algorithms have been used effectively to automate processes and speed up work. However, he has some concerns about the process used to create them.

“Where algorithms exist, they can definitely allow you to improve work in many areas and dimensions: the measure itself, the response to decision-making, image analysis, chat. It can make a more effective approach to business processes. However, I am concerned about algorithms used without any validation process and verification how they work. There are solutions where you throw in some data, set a thesis and the algorithm will show you conclusions. Up to a certain point, there might be benefits from such approach, but I have some concerns about the quality of decision-making based on these results in a long term. And the democratization of AI will take this further.”

He notes that problems with data can start from a very early stage in the process.

“The first issue is data availability itself. Customers are more sensitive about their data, and do not necessarily want to share it — and app developers don’t help themselves. There are known cases where applications tracked the location of customers, even though these applications were not used for navigation, and didn’t really need map access. Instead, they tracked customers so that the company could later sell data about where there were more customers in any given city for marketing purposes. We have seen this with Facebook too, and the scandal about harvesting data. Such events cause people to be even more reluctant to share data, and that affects the quality of models that are developed from those data.”

He notes that many people are now more aware of data privacy issues.

“It’s not just the scandals. Cookies and ads also raise awareness. You search for a phrase in Google and then suddenly you start to see ads for that thing. It increases users’ alertness, and people start to notice that perhaps their data have been used without their knowledge. GDPR has also had an impact on awareness, because people see statements about data sharing, and have to consciously consent.”

Marcin’s second point is about how data must be handled.

“Even when data are available, preparing those data may take a lot of effort. Generally, in my experience, when it comes to the division of preparation, data collection, algorithm construction, and so on, it may take up to 80 percent of time to prepare the data for modelling and only remaining 20 percent is building the algorithms themselves, even though that is what most people think of as ‘analytics’. We need to have a stronger focus on the importance of data preparation.”

Thinking ethically

Bias can also be included during the model-building process, simply by the choice of methods or selection and preparing of training data. To protect yourself from machine learning biases you need to work on processes to validate data against technical and business criteria and you need to assure transparency on all levels. Transparency is one of the main topics in many AI ethics guidelines published by governmental and private institutions.

“You need to be able to explain why your model made that decision and not another. Transparency is an important issue. EU and many others are currently working on frameworks regulating AI and explainable AI is one of the topics discussed. Issues with transparency may discourage companies from developing or adopting advanced models based on neural networks, because they may have a significant impact on decisions — and the reasons may not be clear. We need to solve these issues, because I think everyone now sees how widely AI is used and will be expanded to new areas. This is only going to become more important.”

This interview is part of a recent interview study by SAS on how the pandemic has accelerated digitalization: Catch more conclusions from the study on post pandemic transformation.

A matter of data management: avoiding bias while democratizing AI

Written by Łukasz Leszewski