Responsible AI is the responsibility of the entire organization!

Many are now talking about becoming more data-driven, and thereby more innovative. An organization that aims to excel in data-driven innovation must elevate itself as an organization through customer focus (end-user focus), and by obtaining and utilizing the data necessary to understand the customer. To achieve this, significant parts of the organization may need to change. We explain here how we believe this can and should happen.

Espen Haukeland Kristensen
Kantega
11 min readFeb 16, 2024

--

(This article was originally written in Norwegian by my great colleagues Lars Nuvin, Magnus Oplenskedal and Kristin Wulff. You can find the original version on our blog or on Medium.)

Data used to provide insights often involves artificial intelligence (AI). In this work, the data scientist and data engineer are central, because they ensure data is available and that it is analyzed.

Figure pointing inwards to “What we mean by a data-driven organization”: 1) An agile organization that focuses on learning and improving the organization and services, 2) is client-focused and 3) listens to what clients say (is insigt-/data-driven)
The components needed for data-driven innovation

Both the field, technology, and regulatory aspects within AI are rapidly evolving. Therefore, making the right decisions is challenging. Technologically many excellent tools have become available, so modeling and computing power are not always the biggest challenges anymore. At Kantega, we have seen that it is often other aspects of AI development that are most painful.

Responsible AI

Those of us with AI as our field of expertise at Kantega have in recent years worked extensively with the topic of responsible AI. We want our assignments and projects to be conducted in a manner that provides reassurance and trust throughout the rest of the organization. There should be openness and transparency in what we do.

Therefore, we want to use this article to highlight key elements to ensure that we create responsible AI.

Figure showing the key cycle of elements to ensure responsible AI creation. From top left: Understand business needs and consequences, Get insights, and assess data​, Define data sources and extract data, Transform and augment data​​, Create models​, Evaluate models​, Deploy models​, Serve data in a way that provides insights​, Measure and monitor​ the service​, Produce data​, then back to step 1 and repeat.
Key elements to ensure responsible AI creation.

This figure demonstrates a typical cycle for a data science project. If we look at the different boxes, we understand that this is not something that only concerns data scientists, but the entire organization.

Just like a data science project encompasses the whole organization, responsible AI is also something the entire organization is responsible for. In the sections below, we present what measures should be taken at the different stages to conduct AI projects in a responsible and safe manner.

1. Producing Data

Data is used and created in almost everything we do in an organization. For example, a case worker needs data to assess whether a user should receive support or not, and produces data by storing these assessments. When determining if the case worker can be assisted in this work by artificial intelligence, one needs an understanding of what AI is and how it can potentially be made easier with even higher quality.

Questions we should answer:

  • Do we have knowledge of what AI can be used for?
  • Do we have an overview of the limitations that exist?

2. Understanding Business Needs and Consequences

No AI models are perfect, and even with many iterations and fine-tuning it will make incorrect predictions. Therefore, as part of responsible AI, it is important to understand what the business need is and the consequences for both the company and the users. For instance, the threshold for errors should be significantly higher in fraud prediction than, say, an email dispatch as part of a marketing campaign. Both can be done with the help of AI, but the consequence of an error is significantly more serious in fraud prediction. We must also make an assessment of what should be done and how extensive it is for the organization to handle situations where errors occur. In other words, we can ask ourselves:

“Can this project withstand a negative article in a national newspaper, and how big would the potential loss of reputation be for us as an organization?”

Questions we should answer:

  • Is it okay for the system to be wrong sometimes?
  • How do we handle situations where errors occur?

3. Insight and Assessment of Data

Just like using and producing data throughout the organization, gaining insights and evaluating data is also something that must be undertaken by the entire organization. As a data scientist, you have good knowledge about the possibilities and limitations of different models. But you do not have the opportunity to possess all the in-depth knowledge of products and data.

In a situation where we, for example, are going to use AI to create sales models, we will need input from sales to understand what information and mechanisms are used in contact with customers. There will also be a need for input from a product manager to know if the product has changed its nature significantly, which will affect how far back in time we can use input data for the model.

If you’re using personal data, it is also very important to consider GDPR. Has the user given consent for us to store data? Is our purpose compatible with what we have told the user we will use the personal data for?

A final, but important element of responsibility:

Even if one has data that is legal to use and there are strong indications that it can provide good predictive power, should it be used if it is not ethically right?

Questions we should answer:

  • What do the data contain?
  • What considerations must we take regarding privacy/GDPR?
  • Is it ethically right for us to use the data?

4. Defining Data Sources and Extracting Data

Once we have answered the questions mentioned above, we typically reach the point where we need to extract and gather data from various source systems. Even if we have not yet used data in an AI model, the use of personal information is still involved.

Therefore, we must have control over where data is collected and possibly stored. With sensitive data, we must be extra careful about where it is stored to prevent a data breach. Additionally, we must consider whether there is a requirement for data anonymization. Data should not be stored indefinitely unless there is a legal requirement, so we must have deletion routines in place. With deletion routines, it’s also important that a person can withdraw consent for the use of personal data at any time: it must be possible to delete personal data in an easy manner.

Questions we should answer:

  • Where should data be collected, and how long should it be stored?
  • Have we anonymized the data sufficiently?
  • Is it possible to delete data in an easy manner?

5. Data Transformation and Refinement

Once data is extracted, the job of transforming and refining it begins. This is the most time-consuming job for a data scientist, and some claim that up to 80% of the time is spent preparing data for modeling. This step is about extracting as much information from the data points as possible, and we therefore test various ways of performing these transformations. A known problem here is not having good enough control over the consequences of these transformations, which can lead to unexpected behavior in the models being trained or the introduction of biases that were not anticipated. It is therefore important to have version control procedures for data transformations and datasets so that the original datasets can be recreated. An example of this might be introducing biases into the dataset that will negatively affect the model. For more on this topic, read Nora’s blog article: “Assume bias. Always.”

Questions we should answer:

  • Do we have version control on transformed data?
  • Can we trace transformations back to our raw data/source?
  • Are we introducing bias in the transformation/selection of data?

6. Producing Models

With access to transformed and refined data, we can finally begin to produce models!

Although recent times have seen the introduction of useful tools that test many models for you and tell you which ones provide the best prediction, custom modeling is still, in our opinion, the best option.

Is it really true that the most accurate model is the best one to use?

It’s generally understood that the more complex models you use, the better accuracy you will achieve. But this can also compromise how easy it is to explain what actually happens in the prediction. In most cases, it will be easier to explain how a model arrived at an answer using logistic regression than a deep neural network. Here, we must make our own assessments to find a favorable point between accuracy and explainability in every project. Additionally, it is important to have version control on your models, so it is possible to go back in time to explain why an outcome was given, even if it is a model that is no longer in use.

Another aspect — that is unfortunately often overlooked — is sustainability. The field is rapidly evolving these days, and we see that new, massive models such as GPT-4 are being developed. These large models require a lot of computing power to train. Computing power = electricity consumption, which in turn is strongly correlated with CO2 emissions.

Questions we should answer:

  • Have we considered explainability in the model?
  • Can we go back in time and reconstruct why a person received the score they did at a given time?
  • Have we considered the electricity consumption related to training large models?

7. Model Evaluation

After producing a model, it is natural that we evaluate how well it performs. Here, it’s common to measure the model’s accuracy on a dataset that is entirely new to the model, which is known as validation data. There are several ways to measure accuracy, and what to use often depends on the cost or severity of making a mistake. For example, if you want to make a sales push by sending out emails to potential customers, the cost of being wrong is relatively low. Then it’s important to ensure that you have a model that includes as many interested customers with a high probability of purchase (“true positives”) as possible. If there are many false positives — people the model thinks will buy but who in reality will not — it’s not as detrimental from an economic perspective. If, however, you use the model to optimize for someone to sell by visiting potential customers or calling them, then the precision requirement will be significantly higher, because a failed sale will have a higher cost. In practice, this means that we optimize the model so that the selection of interested customers with a high probability of purchase is smaller, but more accurate than in the email dispatch example.

Another important aspect of responsible AI is fairness and discrimination. Even if we do not include gender or ethnicity as a variable in the model, patterns in other data can indirectly discriminate between these. There are methods that can check if we indirectly discriminate, something we should have control over when evaluating a model. This also means that management may need to be involved to discuss “how fair” we aim to be.

Questions we should answer:

  • Have we optimized for the right metric?
  • Have we considered if the model is equally fair for all users?

8. Model Deployment

Our model is now created and evaluated. It performs well, can be explained, and is fair. We are now ready to deploy it, and create even more business value for your company!

When the model transitions from being a local project to being exposed to the whole world on the internet, you must consider security. If the infrastructure is not secure enough, it could lead to data leakage. Data leakage is serious in terms of privacy breaches, and the more sensitive the data, the greater the problem in terms of reputation. Additionally, securing the model itself is important. If you have a model for, say, fraud detection, you can be fairly certain that malicious actors want to get their hands on it. If they manage to do so the model will no longer have any value, and the fraudsters can also reveal underlying business secrets, which would be highly problematic.

Questions we should answer:

  • Is the infrastructure supporting the model secured enough to prevent data leakage?
  • Can malicious actors access the model and underlying secrets?

9. Delivering Insights in an Understandable Manner

After the model has been deployed, the next natural step is for users to begin interacting with it. Then, it’s important to understand if users can comprehend the model’s message satisfactorily. To deliver insights in an understandable way, it’s natural to involve UX-people (those who work with user experience) in this process. A good example of how understandably insights are presented, is this rejection of a credit card application:

“We regret to inform you that we cannot approve your application.

The rejection is based on the information you provided in the application, and information we have obtained from credit bureaus and available public records.

Please contact us if you would like more information.”

Versus:

“We regret to inform you that we cannot approve your credit card application.

The main reason for this is a combination of your youth and your relatively low income.”

In these two examples, you can see how a result can be presented. Both are rejections, but text number two gives the user a better understanding of why the outcome is negative.

Another example is a model that flags suspicious transactions for manual check. Like previously mentioned, a model will sometimes be wrong. And when a machine is providing the response, we humans tend to trust this answer more. In this example, then, a human will sit and look for errors, even if there are none. The consequence of this is that the transaction is mistakenly marked as suspicious, with all the consequences that may have. It is therefore important to convey to the consumers of the model that this model can also be wrong, and that one should not always trust the outcome 100%. How you convey this message is key, and you should user-test different presentation proposals and observe the reaction patterns of those who are testing.

Questions we should answer:

  • Are the results from the model presented in an understandable way to the system’s user?
  • Can the user use the result as it is presented?
  • Can the user report their concerns to someone if they find the answer to be incorrect?

10. Monitoring and Measuring

The last link in a model’s cycle is monitoring and measuring. An important premise for actually creating business value with the use of AI is that users trust the model. If they don’t trust the model, they won’t use it, and then it won’t provide any value —in fact, quite the opposite. Therefore, we must have monitoring of the system in place, that tells us how often and how it is used. Additionally, customers’ usage patterns will change over time, and the model will therefore perform worse over time. It must therefore be re-trained regularly.

A convenient solution is to automatically update the model with fresh data at a given interval (nightly, weekly, monthly), but if we do this, there’s also a risk that we re-train the model so that it performs worse. A good tip here is to “lock” the model that is in production, meaning that it is not updated with fresh data, and instead have a shadow model that is continuously re-trained with new data. When we see that the shadow model performs better than the model in production, the shadow model is “locked” and put into production. If we do re-training in this way, we will have better control, and we will avoid uncertainty within the organization while also ensuring safe AI serves our customers.

Questions we should answer:

  • Do we have an overview of usage patterns and trust in the model?
  • Do we have sufficient insight into how the model performs, both with and without updates?

Conclusion:

To ensure responsible use of AI, it is necessary for the entire organization to take responsibility.

This includes the technical challenges such as data production, data handling, modeling, security, and model evaluation — but also the more organizational aspects such as understanding business needs and consequences, ethical questions, sustainability, privacy, and GDPR.

To ensure that data and models are used in a responsible and safe manner requires an understanding of both the technology and the consequences of using AI. By following the measures presented in this article, organizations can succeed in data-driven innovation and create trust and safety in their AI projects.

--

--