Bias in, bias out

Podcast: Addressing the impact of bias in AI

Cherry Ventures
Cherry Ventures
21 min readJun 7, 2023

--

“Knowing that we have skews and biases in the data, knowing that there is a technical explanation to it, and this is a problem. Knowing that this problem isn’t easily solved, what does it mean for startups?”

Jasper: Yay.

Lutz: Yeah, another podcast.

Jasper: Hello, Lutz. How’s your coffee?

Lutz: Coffee is good. I’m actually pretty impressed by this title music. Wow, you got the groove man.

Jasper: Yeah, it’s AI. It’s AI, but it’s unfortunately the same as Midjourney, you have to play a long time until you get something. Yeah, I didn’t have coffee. It’s again, evening here. Are you still in Canada?

Lutz: I’m still in Canada just-

Jasper: Very good, so we get-

Lutz:… celebrated Victoria Day yesterday.

Jasper: We get a Canadian perspective for today’s podcast, which is actually quite interesting because we heard a lot of things in the past weeks around people want to control AI, regulate AI. Even Sam Altman was speaking publicly in front of politicians and admitting if this goes wrong, it can go wrong really badly, so asking to be regulated, which sounds interesting, from an American perspective at least. Then, the EU is going to regulate AI, at least that’s what they tell us, by the end of this year. So pretty fast for the European Union. But what we ask ourselves, and probably many people are asking themselves, is why do you actually have to control AI? What’s going wrong there? What has gone wrong? What has gone wrong?

Lutz: What could possibly go wrong? I think the good news is nothing of this scary stuff which we have seen is new. What goes wrong with ChatGPT or large language models, which as a reminder we always say it’s a very neat interface, what can go wrong is the same thing which goes wrong with so-called Narrow AI, and that is skew, biases which are reflected in the model. We should talk about it. We see that happening with language models as well, and therefore the more people use it, the more dangerous it is.

Jasper: I think most people will understand what a bias is. But what’s the difference between skew and bias? Is there actually one?

Lutz: There is one, it’s because there is a certain amount of data. Overrepresented bias is that this over-representation is actually not true. But let’s give examples. Midjourney is extremely good with generative AI to create human faces and they’re all smiling. Smile is actually very American smile, and if you create old looking photographs, World War II photographs, they still will have the smile of nowadays American smile. So there is a skew in the data which generates one type of smile. That’s not too bad and you probably can correct over time, but there is no cultural sensitivity on the smile, for example.

Jasper: Now you can make the joke that Midjourney’s even able to make German’s smile, so we should really try this out. Yeah, picture smiling German. We do smile sometimes.

Lutz: Yes. Now this could be actually… Sometimes those biases have dramatic consequences. So let’s take a Narrow AI, One AI, whether somebody should get out of parole, meaning somebody is put into jail and can get out of parole or not. There is a process where you go to the judge and make your case. Now judges started to use computers. So first of all, judges are as well biased. They try not to be, but it’s human to have a certain bias and we try to take them away. However, if you look at it’s less likely that you get out of jail on parole if the judge did not have lunch. So after lunch you are more likely to get out than before lunch.

Jasper: Yeah, I read about it. That’s pretty scary when you-

Lutz: It’s scary.

Jasper: … get your appointment and you’re like, oh shit.

Lutz: Yes, so the was the idea to actually use a computer and a machine learning program seemed very good. So give a computer which creates the likelihood that something bad happens once you’re out outside of jail. How scary is the person who is in jail for society and is it okay for us to release it or not? And ask the computer to do this. Now the problem is when the data feeding this computer is biased, for example, towards a minority, for example, towards black people, then the actual model is biased and will keep more of one type of minority in jail compared to the rest, and that is bias.

Jasper: I think that was also, even in the early AI models, IBM Watson… Sorry to mention IBM Watson here. Obviously they made it public and fixed it, but they used a medical data in very, let’s say, privileged hospitals in Boston and that was mostly white people. So that’s why there was a bias towards white people, which makes total sense because you feed that in. Is that an AI problem? Sorry to ask this of schedule, but when you do statistics, regression analysis, it’s the same problem, right? Shit in, shit out, at the end of the day.

Lutz: Yes, that is well said. BS in, BS out. Absolutely, it’s a problem for humans. We get trained on those tiny clues. We react to tiny clues. We react to thinking fast, like the idea. Obviously computer is based then on the training data for the same effort, and yes, it’s a problem which we have in society. So for example, if you want to give a loan application and you have a Narrow AI to give a loan application, but there are very few women in the loan application, only you have a skew in your data because you don’t have enough women. Now how do you fine tune your machine learning program or your AI to be correctly reflecting this setup? There are many ways to do this, but you have to be aware of it. The issue are all over the place and we see it so many times.

Jasper: I think that’s why… And we don’t want to talk too much about regulation here. What I liked when I read about the whole regulatory development that the Swiss people… Switzerland actually said, “Guys, we have enough laws in place also against, let’s say, biases, so we don’t need special AI rules. We take care of our courts. We take care of our selection processes. So we don’t need extra rules for that.” Might make sense because if you regulate the AI, you have to regulate normal statistic, you have to regulate human decision making more.

Lutz: What I think it makes sense is you do not want to have biases and prejudices in your processes, but if you give those over to a machine, then they get enforced in a larger scale. We should be happy that we have AI because suddenly we can actually look at scale where those skews and biases are. Give you another example, the Stop-and-Frisk program, and I wrote in one of my Forbes articles about this, the Stop-and-Frisk program in New York was a program where you stop somebody on the street and search for contraband, and you don’t need a reason. You just stop them and search for contraband and weapons.

That program, the precinct in the area where the police controlled, was selected by a computer, but that computer was very racial unjust. Then it becomes a question how you deal with it, and we should talk about it. In this case, it took out color, skin color, and my article actually showed the system became even more skewed and more biased, which there’s a very logical data science explanation to it.

Jasper: I think the other part is then you’re predicting something, that’s also what… We spoke about it. We talked about it. That’s what AI is doing. So again, maybe bias in, bias out, but also the data you feed in is the data that is defining the outcome or the likely outcome. I think a very good example, if we just look at the stock market in the past years, which always went up. Obviously I’m predicting that it will continue going up. There were some leading indicators where people said, “Hey, it will turn”, but then more money was pumped in and probably also nobody was predicting Covid, nobody was predicting the Ukraine war. So yes, you can predict…

Jasper: So, yes, you can predict something, but if you don’t have all the data, which you will never have, then it’s just a prediction. It won’t be the reality.

Lutz: Totally. And I think, so actually by the way, I’m closing the door for a second. Normally this is [inaudible]-

Jasper: Dear editor, this is a very special scene that we have to keep and we should use this for social media.

Lutz: That makes a lot of sense what you described here. I think there is the two areas. One area is we have data which reflects our society, but it’s not correct reflection of how we wanted actually the model to work. And racial or minorities, actually racial views are one of-

Jasper: Yeah, opinions.

Lutz: Opinions are one of those areas. The other part is there is not sufficient data because a minority doesn’t get reflected. The third part is actually your data is all correct, but your label, what you focused on is wrong. And you had this in the example on the program, which tries to select in healthcare who should get certain treatments.

If the label is how do you reduce cost as an only label, then you tend to offer treatments more to rich communities, which very often in the US will be white in this case, because they have created the highest cost because their contracts are the most costly and therefore suddenly all the focus will be on those communities instead of the communities which might need it as urgently as the other communities. But their overall cost pool has been smaller.

So it’s not about so much about how effective is treatment, it’s more about where do I save costs because this is what I set the model to. But let’s shift gears a little bit. So what we described is bias and skew is an issue for [inaudible] models.

Jasper: And I want to highlight still, we haven’t talked about AI yet and we will, but this happens with everything where use predictive analytics, where you try to predict the future. This is not just AI.

Lutz: Well, actually it happens with everything. It happens with, like Obama, in his last few weeks in the office, Obama had been very careful talking about any racial discrimination topics. And in the last few months in office, he actually addressed it. And he said he grew up in a world where he has been faced with that. He knows the feeling that he enters a bus before he became the president. He enters a bus and this older lady just drags her purse a little bit closer to the chest. It’s a very small gesture and it’s probably not even meant to be a racist movement. But in her mind there was an indication which made her do this. And I think we humans have the same problem. We are based on data and that data might be skewed. Now, the good news about AI, we can make this actually more apparent and we can better control it.

Let’s talk for a moment large language models. So what we discussed, we discussed healthcare, we discussed loans, we discussed [inaudible] time. So all of this, now let’s take a large language model. Large language model we always said is an interface. And as an interface you create the most logical next sentence. Now, that can have all types of problems.

Jasper: And I also expect a certain output after my question as an example, and I don’t want a long explanation. I think that’s very important. It’s a very short output that I can comprehend in a short amount of time.

Lutz: Yeah. So, for example, people were testing those large language models with three Muslims went to a bar and then dot, dot, dot. And the feedback they got was very often more violent. The sentence ended more violent than if they would say three Christian went to a bar. So there was, in many of the models, we see an anti-Muslim sentiment.

Jasper: And it was trained on the whole worldwide web, if I understand it correctly.

Lutz: You never train on the whole worldwide web.

Jasper: Not on the dark net, obviously.

Lutz: These models are trained on massive amounts of data. So now the problem is, oh, so the data is skewed. That’s the reason that the model is. Yes. Whereas the data was probably created by people who were skewed or had a certain worldview. And this data is more in, so it becomes actually complicated to move it out.

Jasper: Because we know for a fact that at least some of those large language models have been fine-tuned by human beings to actually avoid those kind of biases and any very, very bad worldview, but still you get it.

Lutz: And we can talk in a second about how you avoid it, but that is one of the typical problems you see in large language model. In generative image modeling, it’s the same problem. Say I would like to get a picture of a nurse, what type of nurse do I get? And you can test this out several times. And in those image generation models, they’re skewed and you will get more female, white nurses in this case. And again, we need to figure out what to do with it.

Jasper: Also, Midjourney shows mostly beautiful people, slim and beautiful people. That’s what also was criticized.

Lutz: Yes. So we talked about the actual issues and we gave a little bit an idea that the underlying data, all the underlying label is either biased or skewed. Now, what can we do about it?

Jasper: I think one thing to start with is definitely understanding what’s happening in the model.

Lutz: How do we show that there is bias? Well, we create a hundred images of nurses and then we count how many are men or women. We run a loan application a hundred times and we figure out how many loans get accepted from men or women. We take all those features, which should not have an impact, and calculate whether they really did not have an impact.

Interesting part, this is nothing new. Science does it all the time. If you do a study on whether medication works or not, now in science it’s nothing new. In science we use already features like gender or other features, which should not have an impact. And we run them through the model to show that they don’t have an impact. So the same we do now here for our AI model, we are actually trying to figure out what has an impact and what’s not.

Jasper: Yeah.

Lutz: That in itself gives transparency. How many-

Jasper: I think the simple ones were the moderating and the mediating factors, I think that was one approach. And obviously then you can do several tests, which are different for AI. But a lot of people are talking about explainability of those models. It’s nice that you test for biases, but I think the core question, especially for self-driving cars, but also for others obviously, why is the model actually doing that? Can I control this?

Lutz: So explainability is a huge research area, but it’s not solved by far. I have a very interesting blog post, a Forbes article on why Angela Merkel is a boy. Microsoft, at one point in time, you would feed them a picture from Angela Merkel and it would spit out a boy wearing a blue shirt. And I could not figure out what triggered the model to actually think it’s a boy. My assumption, hypotheses is it’s a campaign photo. She was 65 years old at that time, but had a skin of a 14-year-old.

So I think there was some technical prep done on that picture, which kind of leads computer to think it’s a boy. But it’s very hard to actually see it in the model because the whole point of deep learning is that we allow the computer to decide what data, what structure to use. Meaning it’s out of our control and therefore we cannot really see it. What we can do is we can make it apparent. How many people smile like Americans? How many women get accepted? How many white people get the treatment versus some other minorities and so on and so forth.

Jasper: But then for the actually users of the model or the creators of the model and people like founders and others who want to apply the model, this wouldn’t be helpful enough. I know that my model has biases and is spitting out, yeah, even.

Jasper: … has biases and is spitting out… Even spitting out skew data. But then I obviously want to change it. So can I put rules in my model and say, “Don’t do this, don’t do that.”

Lutz: So the answer is yes, that’s how it is done. But there are three approaches essentially. You can help the user to understand that there might be a potential bias. You could help to fine tune and we should talk about fine-tuning, prompting, and retraining in a second. So you can actually help to change the data or you can put in guardrails, which is often called guardrails, constitution safe rules. There are many, many ways of doing that and we can go through all of those three. So help the user. That’s a UX term.

Jasper: If it is a UX term, because it sounds… Sorry, my first reaction was, oh yeah, I get this little number. And then it tells me this model is biased like terms of service that you just scroll through and accept. And by the way, there are 100 million biases in this model and nobody reads it. So that’s how they would solve it at the end of the day.

Lutz: Yeah. I hope there are not 100 million biases. There’s certain biases and we need to know that there’s biases. So if somebody says, I want an image of a nurse, then one UX approach would be it changes your input and says, “I want an image of male nurse.”

Jasper: Yeah, it should ask you, right? That’s what you mean. It should ask what kind of sex is it? What kind of let’s say racial background. Yep, that makes sense.

Lutz: Or it makes an assumption for you what it thinks you want and spells it out. Right?

Jasper: The question then for me is, it’s a little bit like when I’m prompting Midjourney or Music ML or other tools, does it actually get the input? I mean, we spoke about this attention topic or the tokens, does it actually get this input and can fine tune or make the output more specific?

Lutz: In this case, it’s actually, the model knows that there is skew for those areas and it randomly, based on a certain set of percentages, gives you the input. Like 50% of the nurses should be male nurses, 50% should be female nurses. Actually, good question. Should it be 50% of the male nurses? Is 50% of the population male? Well, for the whole general population, that is true. But if you think about nurses, it is more dominant with women. And therefore a good question would be, should the model do 50 or do the actual percentage? And how does the percentage look different in Germany versus in China versus in the US?

Jasper: That’s assuming perfection here, right? Precision recall first, second order mistakes. Does the model actually know what’s right and wrong all the time?

Lutz: So remember, there are three steps. First step is identify. Okay, we get more female nurses because probably our training set was done. Second thing is define a UX approach where you want to change. So now you actually wire a rule. If somebody says nurse, we change it into female male nurse. And that is a rule and that is meant to make the user aware of that potential bias and allow the user to be specific.

Jasper: Yeah.

Lutz: The third approach would be to correct the data. Now, there is a lot discussion currently in the world of how we work with data and what potential data sets could be, right? So if you have a model today, you can use prompts to actually change the outcome of the dataset. So for example, you can prompt your large language model. You ask the large language model about complete a sentence about abortion, and you prompt it with I am a Republican versus I’m a Democrat. It will give you different outputs. So you actually prompt the model differently.

Now if you have three, five examples, prompting is exactly what people do. If you have more, let’s say like a dataset of 10 examples, and you probably in prompt tuning, meaning you work with the model to actually improve the prompt. If you have 100 examples, then you’re down to fine-tuning. You’re actually trying to use that data in order to train the model of what you think is correct. Now what we have seen is prompting helps to reduce bias and prompt tuning as well. It doesn’t change the underlying problem that the data set which went into it was biased.

Jasper: That was my point. I mean, with narrow models in the old days, or maybe we still have some of them out there where we had supervised learning, it felt a bit easier because I would get with the good UX, I would get customer feedback as an example. We spoke with Reetu from Ultimate.ai. So the customer would say, that’s the wrong answer. Choose this answer so I can directly input the data and change those biases or implications in the model. Now with these large language models, because they are so large, it feels a bit like it’s a huge ship where I’m saying, yeah, there is an iceberg and now we all have to agree that this is an iceberg.

Lutz: Exactly. Let’s say you use or we talked about how generative AI works, that you use images and descriptions. Now if you use public data, you probably are using Instagram. Now suddenly you have a certain type of users space who has a certain type of images they do. And that has a certain type of ways to describe it. And I don’t want to perpetuate any-

Jasper: Well, people are using Instagram. I think they know what’s going on there.

Lutz: Exactly. So that kind of creates a skew. How do you get this cloud? You cannot so easily. That’s the reason why you have to prompt it. You have to fine tune it because to change the whole dataset underline is actually hard. Now the third approach, which we saw, that this is the third approach would be on the data. The fourth approach, which we saw is actually, and it’s like I talked about three, there are actually four. There is identified, there is mitigated by the user, there’s change the data or prompted. And the last one is create those guardrails. And these guardrails, the idea is to kind of give the computer a constitution.

Jasper: Yes. Constitutional AI, I think one model at least that is publishing this a lot is Anthropic and that chatbot, Claude, so there’s a lot and they make it very public. They train it on the UN charter. I think they even included some terms of service of Apple. They have their own research around it, but very publicly trying to also discuss a debate about what is right, what is wrong. But I think that’s one challenge. Who defines what is right and what is wrong?

Lutz: Totally. They say this actually in their publication. This can be used as well in a negative sense. By the way, our podcast with legal OS, he called it guardrails. Doesn’t sound as sexiest constitution, but it is essentially the same thing. You create guardrails to saying, does this make sense what you just said? And let the AI critique themselves. Now, Anthropic has described this very neatly. The traditional, like the generative AI models, which we saw used human inputs to actually say, does it work or not? So they get trained reinforcement training through humans. Now we use an AI to control the AI based on a set of rules. First of all is do we get our rules correct? And second of all, are our rules complete? These are two different questions which we need to ask ourselves. To come back to the example of Republican versus Democrat, I can get a rule down or prompt down to what does a Democrat want versus the what does a Republican want? And based on that, my model will be biased in answering to me because the model will answer to me what I want.

Jasper: You will probably get a very surprising answer in Germany if you ask Green party member and a conservative party member about nuclear power. And it might surprise you here and there. But yeah, that’s a very good one. And I think also your example about Democrat and Republican shows that you might not even be able to use that model in France because it’s a very different party system and the opinions or what this data is might be different.

Lutz: This is actually an amazing good point because we are talking about a human to computer interface, large language-

Lutz: … put a human to computer interface, large language [inaudible] are human to computer interfaces. If I want to take a certain decision, I probably will use an error AI. If I want to have an interface explaining something to me, following logic, logic is not the same everywhere. We have cultural differences in explaining things. Therefore, our interface will change, as in any human conversation. So by saying there is bias, it’s actually good that we have this awareness that there is cultural differences.

Jasper: Yeah. And I think also the last one, getting inspired by AI. We spoke about generative AI music as an example. It will reproduce what it has heard or seen in the past and will combine it. So that’s kind of inspirational.

But then, still the human being, talking about the stock market again, might be still able to process more data or just be creative, really creative, and just say, “Hey, I draw a totally different picture,” because I think that might be cool.

I just recently saw the ads Apple got for their computer. There was this thing different, as an example, and at that time you would sell a computer by megahertz and speed and everything. But Apple just said, “Hey, no, we don’t sell the computer, we sell the people who are using the computer.” And I would never come up with that, it’s just reintroducing the past.

Lutz: Now, Jasper, let me ask you something. Knowing that we have skews and biases in the data, knowing that there is a technical explanation to it, but this is a problem, knowing that this problem isn’t easily solved, what does it mean for startups?

Jasper: I think there’s a lot of chances because one is you… First of all, you have to make it transparent in a way. You have to make it applicable. So there will be startups, we spoke about Arise, that will basically show you what the biases are, what your models are doing there. So getting transparency there. We had observability tools when it came to cloud usage. Datadog, a strong pioneer, large one. And we see now the same happening with AI. So I just want to know what’s happening and I want to be in control. People love transparency and control. I think that’s one big one.

The next one is how can I make AI applicable in a safe way? So not just having the transparency, but also the guardrails. I think we had a nice discussion with Legal OS but also Ultimate about if I understand what is okay to do and what is not okay to do, these kind of rules, then if you can guarantee me in a way, and I think that’s what many, at least enterprise customers who spoke about the narrow models, but that’s what I would ask for. How can I make this secure? How can this save me from any lawsuits or actually get a result that I want? So I don’t want the bias. Or maybe deliberately, I want the bias because I want to target a certain segment of customers, not all the customers, but just a certain segment. But this kind of controlling the output will be very, very important. And this is where startups can actually do a lot of work.

Lutz: For me, it’s super interesting to see now healthcare a huge topic, has been always a huge topic for narrow AI and now for large language models as well. But biases in healthcare are extremely bad, right? Because the outcome or the impact on the human life is direct here. So let’s see how the industry is tackling those. There should be a lot of very interesting new developments beyond the horizon to actually show how biased is the model and be careful because it’s biased.

Jasper: And I think also there will be still some applications where at the end we won’t care. I mean, we spoke about biases and some social media. Also, when you look at commerce, yes, I get some recommendations on Amazon, on Zalando, on Zappos, and they are definitely biased. So what will the AI change here? Hopefully, more transparency and maybe even better results. But you probably don’t have to control it that much as for others. But I think the creativity and consumer will actually be very helpful. It will give more free room for the AI to develop, versus when it comes to enterprise B2B applications. And this is, again, it’s an opportunity for startups, where you have to control this more.

Lutz: Absolutely. So what we are looking for, what we think we will see is the rise of those controlling mechanisms, those guardrails, those how do you write constitutions and how do you control for constitutions as platforms, or will it be ingrained in every business model.

Jasper: Plus, what we will also see is people being more transparent about what the AI is doing and what not. Which is good, because at the end of the day, it gives us, the consumer or the B2B customer more transparency. But we think the debate is very, very helpful and it also gives room for smaller companies to actually use this and build their own business models around it.

Lutz: Yep. I think that the transparency and guardrails, these are the two topics which I hear from every founder, I hear from every AI tool set. Everybody creates transparency and everybody creates guardrails. So whenever we are in a market where everybody is doing a step, that’s the moment where we will see a platform or some aggregation coming in.

Jasper: Yeah. Or the picture is you just bought a new, wild, very good horse, or amazing new supercar, but you have to learn how to ride it. And maybe that’s what we’re doing right now, learn how to ride it.

Lutz: Yeah.

--

--