Transcript: Women Who Code — Silicon Valley Full Interview with Julie Novic

Ethics in AI and Machine Learning

Dianne Jardinez
WomenWhoCode Silicon Valley
21 min readJul 16, 2021

--

Julie Novic received her PhD in Anthropology from the School of Human Evolution and Social Change at Arizona State University. There she explored research questions on consumer economics, identity, and urban space from the lens of quantitative archaeology. After a post-doc at Colorado College, Julie realized that she wanted to have an impact on modern day society. This led her to the Case Western Reserve Data Analytics bootcamp where she learned the Python, SQL, and machine learning skills needed to take her from researcher to Data Scientist. Within the Data Science world, Julie is interested in the ethical application of Artificial Intelligence in solutions to important business challenges. Her focus on ethical AI led her to Pandata, where “approachable, ethical, human-centered AI’’ is not just a slogan but a core part of company culture. Julie is a Data Scientist at Pandata

Dianne Jardinez of Women Who Code had the pleasure of interviewing Julie Novic on “Ethics in AI and Machine Learning” where we discussed overcoming obstacles with ethics and tech industry advice in AI and Machine learning. The main content of the interview is below. On behalf of Women Who Code — Silicon Valley, we thank Julie Novic and appreciate her time being a part of our #ShoutoutSaturday series.

Dianne: What is Data Ethics?

Julie: That’s a very good question. It’s a complex question as well. So a lot of people when they think about data ethics, their first thought comes to fairness, whether or not an algorithm is treating individuals, disadvantaged segments of the society differently, or groups of different people in a different way. So one of the most powerful examples is the compass recidivism AI model that was disproportionately false with African Americans when it gave its results of the chance of recidivism, it was saying more individuals who were African American would be recidivists, when in fact that it was a false positive. Like that is a very powerful example of fairness. That has been in AI and an algorithm that could be construed as not being fair. And there are all sorts of debates on what fairness means, and what definitions we have of fairness. So it’s a very complex issue, even in that regard. But that’s not the only part of ethics in data ethics, there’s more than that. And it really for the definition that I view is that putting humans in the center of machine learning. And by doing that we’re focusing on their needs, human needs, human needs for privacy, human needs to trust that the algorithm is making the right decisions, human needs to understand what the algorithm is trying to tell them, in addition to that fairness. And there are probably more issues of fairness to data ethics that I’m missing, but those are the ones that come to mind when I think of data ethics, right off the cuff. And what it means to have data ethics is to also be transparent in what your model is trying to accomplish, and what is trying to do. And all that taught again, that all ties back to being human-centered, in your approach to machine learning, putting people at the center and not not the algorithm at the center of what you’re trying to do. And that’s my definition of data ethics.

“To have data ethics is to also be transparent in what your model is trying to accomplish, and what is trying to do.”

You talked a little bit about it, like the algorithm, do you do have a definition of what it means for an algorithm to be ethical?

The algorithm itself is not, not ethical in, in and of itself, it’s the way that it’s used and applied, that creates an ethical situation. So you can start, you know, start with data that’s biased, and feed it into a mathematical model. And the end result will be biased whether or not the algorithm itself is biased. It’s not the math, but the way that it’s been trained and taught and applied is where the ethical conundrums lie. So, you know, in a sense, algorithms in and of themselves are neutral. It’s the way that they’re used and applied that brings the ethical question.

What are some challenges that could be found with AI and machine learning?

Getting ethics and data is really about understanding the inputs and the outputs. So there’s data bias, that’s one major area problem. There’s the application of the model, to make sure that it’s not harming individuals that it could be harming. There is whether a model violates privacy issues. Are you using data that really should be private? Does the individual’s data have the right to be forgotten? Can you understand the model? Those kinds of issues. A good example of this would be a model that was built to distinguish between huskies and wolves and was very accurate distinguishing between huskies and wolves. But when you try to explain why that model is making that decision, what was discovered was that it wasn’t focusing on the Husky or the wolf was focusing on the snow in the background. That was what the model was actually making its decision on mathematically. So, “Is the model itself doing what you intended it to do?” is a focus of ethics in models in AI and machine learning. Are people who are using the model able to trust the model?

One I often encounter individuals who are like, ‘oh, that person doesn’t trust data, they don’t understand data.’ Really, it’s not that they don’t trust or understand how to use data. It’s really that they have valid and reasonable concerns about how the model comes to its decisions, how the model is used, how the model is developed, what goes into the model, and that transparency is missing. And so they don’t trust the model because of that. And that is not just a data ethics issue, but that’s also an implementation issue, how do you get people to trust the model? So those are concerns that fall in the field of data ethics, that are challenges, especially if you’re not thinking about them. Because there is this tendency to view math as value-neutral. When people are using math, and people do not value-neutral data, it is not valuable. These are things that we need to think about it and try to mitigate, mitigate for in creating our algorithms, and creating our machine learning solutions in creating and working in data science.

Do you see a trend occurring where the future of technology will be about transparency in terms of AI and machine learning?

I think what people want is some trustworthy algorithm, something they can trust. And with that, you need to understand how it’s used. And also, this issue of transparency needs to be part of the conversation, because, in order to understand whether or not your models are fair, your models preserve privacy, that your models are explainable, you have to be transparent about them. And in cases, when people are transparent about the models, we can learn, you know where their failings are, and how we can improve the AI solution. So that it is actually an ethical AI solution.

When it comes to transparency for data, with all algorithms, I could see it being manipulated in a way where it could be unethical like in all the calculations that you’re making and trying to predict, it’s how you try to put into those predictions or calculations that might affect it, that would make it unethical, is that my understanding correctly of it all?

No, that’s pretty, pretty accurate. It’s not the algorithm itself, it’s, it’s the data that’s used a lot of times, it’s the data that’s used to go into the algorithm, and imperfections in that data. Um, you know, we have hundreds of years of, you know, structural racism and women didn’t have the right to vote until till this is the last century, and all this data that we have available, has biases built into it, that may be not what we want to continue in the future. A lot of times people want to use data science and AI because they want something that is value-neutral, but then they feed it data that’s not value-neutral. And the end result is something that violates our own ethical beliefs and systems. So I’m thinking of, so there’s the compass case that was talked about then there was a case of a major tech company who built a model to help screen applicants. And they use data that they had from History. And they ended up scrapping the AI solution because the model was knocking women when it was producing its results and predicting success in the company. And the reason for that was because it was picking up on the fact that while chess club was good, women’s chess was not as good. Well, so anything. So even if you took out gender from the equation, there were gendered terms and gendered words and gendered behavior patterns within the resumes that it was picking up. And it was not, it was downgrading women. And that’s, you know, that wasn’t the goal of the solution. When it was being created, it was trying to eliminate that problem, and yet exacerbated that problem. Because the data that was fed was not examined for those biases that exist in advance. The gender shades project was looking at facial recognition software. And if you haven’t had a chance to if you haven’t heard about the gender, gender shades project, I recommend everybody takes a look at it. It’s phenomenal. It was done by a group of women out of MIT. And what they looked at was how accurate were two different facial recognition, software, AI solutions, how accurate were they on different skin tones and different genders. And what they found was that the darker the skin tone, and especially dark-skinned females, it was not as accurate, not even close to being as accurate as it was on the white male on light-skinned males. And if you think about that, I mean, one everyday example is like you open your phone with your face a lot of times, and if the facial recognition has a hard time identifying you, then it has a hard time opening your phone. But that’s small, think about the solutions that are applied to policing when facial recognition software enters policing, and then you’re trying to identify individuals, but you’re making incorrect identifications because the model wasn’t trained on enough data to be able to identify darker-skinned individuals, that becomes an even deeper societal problem. So if you want to take other examples of AI gone wrong, there’s this site called awful AI, where you can look at issues of where these well-meaning artificial intelligence solutions were created, that did not take a moment to look at why there might be bias why they might mind they might not be ethical, then as a result, the end result was something that didn’t harm rather than did good that was intended. I don’t think people go out there to make that AI. I think it’s an accidental thing.

Attendee questions

How do you encourage practical dialogue among teams when it comes to ethical AI?

It starts with having one person who brings it up. That’s honestly the honest truth that encourages dialogue is, one, educating yourself, and then bringing up those conversations in the meeting. And when you’re talking about what can this algorithm do? Say you have an algorithm that, that is recommending majors to people at an academic University, and it’s helping people relieve some of the workload on a career manager, but because there’s bias in the data because there are gender behaviors, where there are more women in the health services industry and more men in engineering. It’s all of a sudden, it’s recommended. It’s not giving women the options to amend the options to pursue those careers. If you’re building that algorithm, understanding, you know, hey, what are some biases in this data? Let’s start there. Is that data bias and how is it biased and will that affect our outcomes? Another conversation to have is like, great, we have this model that’s like 80 90% accurate, or 99% accurate? Is it identifying the things that we really want it to be identifying? How do we use [Python library] tools like LIME and Shapley to identify where these decisions are being made. What’s the decision point on these? And is it doing what were those conversations, bringing those up, it just takes one person to talk and bring it up. That’s how things started, Pandata we had, we started small and we started talking about ethics. One person brought it up, and then got other people thinking about it. And then once people start thinking about it, they start doing their own research into it and start making their own decisions about and thoughts about what it means to be ethical, and then that becomes part of who we are as a brand and identity. So I think that the key is how to have these discussions, bring them up to start them.

The honest truth that encourages dialogue is, one, educating yourself, and then bringing up those conversations

How do we improve this bias in datasets? And how does the industry build trust when these biases are unbiased data?

There is an industry developing to generate unbiased status datasets, that’s their start. And unfortunately, I don’t remember the company off my head but when I attend ethical AI conferences, they’re there. There are also tools like the IBM fairness 360, that help you look at the data to see if it has biases, and then also, ways in which you can modify the data to mitigate some of those biases. So where the conversation is happening, and we’re starting to develop the tools we need to kind of make the future that we want, rather than the future that we had. Those are the tools available to you. And, and there are others out there that I’m not don’t have at the top of my head, unfortunately, but they are there. Um, and, um, so that’s one way I’m being upfront with what your results are. That’s part of being trust, building trust. If your tool is only, you know, 75% accurate on one population 98% accurate on all the other populations, be upfront about that. That builds trust, and then people know when we’re dealing with the population that sometimes gets wrong, we need to delve deeper into that.

Educating your users, your end-users about the limits to your software limits to your AI. That’s another big, big, big step. People have a tendency to want to provide perfection. And what, what I’ve learned is that sometimes honesty in what you can provide is such a relief to the clients that receive it, they’re like, okay, you’re being honest with me, you’re telling me what my limits are. And that makes me able to make the best decisions with this solution that I have.

Google is currently undergoing these challenges and ethics, what are ways that we can mitigate unintentional bias in our code? And what kinds of methods can we use to analyze these and the models that we create?

I mentioned AI fairness 360, from IBM. And I think Microsoft also has some tools available in their Azure workspace to look at group accuracy rates, to kind of break down into populations that you’re looking at, accurate within those populations as you are accurate. For the whole group in a hole for the populations to see that um, that’s a tool that’s available out there. Lime and shapely and there are Python libraries called lime and shape. Those are post hoc algorithms. So what you do is, what I’ve done is you have your black box. You use an intermediate algorithm to Understand what your black box has been labeling, and then you put it into a line model. And the line model tells you the local reasoning, it doesn’t give you an explanation for like, why every single decision is made, but why that specific decision was made. And you can use that to explore whether or not the decisions that you’re making are based on fair reasoning. And the Shapley algorithm is another algorithm that does something similar in different ways. And there are some really great research articles out there that talk about methods that you can use to improve on the results that you have, in terms of ethics as well. Those are some solutions, um, and to build an ethical code. And the start is really that first question, Is this code ethical? You may not get it perfectly right the first time. And that is okay. Because you’re asking the question, you’re trying to solve the solution, you’re trying to make us be part of the solution. Before I discovered some of these tools I would do clustering that would take a black-box model and cluster it to see what are the features, they’re clustering together, just trying the solution to try to understand what my algorithm was telling me was, was the start. And then as this conversation gets deeper and stronger, we’ll have more tools available to us, I think.

According to a March 2020 article by Boston Consulting Group, an American management consulting firm, titled What’s Keeping Women Out of Data Science? “Consensus across various surveys is that only about 15% to 22% of all professionals in data science-related roles are women.” Indicating that this field seems to be male-dominated, any tips for success for women in this field?

I’ll tell you a little bit about my story and how I chose the company that I chose. Archaeology and anthropology are male-dominated fields, especially archaeology. And so I was coming from the background of already having had an experience with a male-dominated profession. And I knew that what I wanted most was a culture that was amenable to women in the workplace. So when I was looking for jobs, I looked to see who was in the leadership. And I saw that at Pandata, there were two-thirds of the leadership was female. They were women in leadership, which was a great sign. I looked through company values. And that was a great sign. And those are two things when you’re on the job market, look who your leaders are, look at what they’re saying, look at what the culture is of the company, and that will help you find an organization that will help to find a mentor.

“When you’re on the job market, look who your leaders are, look at what they’re saying, look at what the culture is of the company”

That helped me a lot. When I started out on this journey. I was scared of code. And I was terrified of what the future was going to hold. And I found a women’s coding group, the PyLadies, for which I am an organizer. And I met a data scientist there who was a woman who was strong and had skills, and she was willing to welcome me with open arms into the organization. And she helped mentor me. And having a strong woman mentor meant a lot. Having that person be outside of my company, the company I worked with, also helped because it creates a different dynamic, you can talk about things that you can’t really talk about internally. Or you could but it’s always a risk.

So having a strong mentor was very important and has always been important. Um having within the company having strong relationships with people in positions of authority, finding internal mentors as well, and they don’t have to be, sometimes you can find a good male mentor, a man can be a good mentor to a woman, that’s possible as well. And then, you know, having more than one mentor is good, and having different genders and you know, different perspectives are good. So that’s to find a good mentor. And the third thing is to help others. I think that that’s very important. I think it’s important to be mentally after you’ve gotten established within your career, and once you become a senior data scientist, or a senior software developer, or even higher positions, that you open your arms to women who are trying to make that jump themselves, and encourage them and help them and mentor them. I think that if there’s Be the change you want in the world is a very, very good mantra. I think that helping other women is important.

What advice do you have for someone who wants to get into a data science field that doesn’t have that data background? In addition to that, what made you want to choose data science?

I’ll start with the second question. First, what made me want to choose data science? I have recently done the research, and I did research and yes, in archaeology, my data set was huge. It had 500 data points. I mean, it’s small numbers, you know, most archaeologists are dealing with 10. Now, if they have 10 data points, they’re dealing with a lot. So it’s, it’s small data. And, but it was research, and it was interesting, and I liked doing research. And so when I was looking to transition to careers, I thought about the things that I liked about my current career, what I like, and I liked, I liked helping people. As a teacher, I liked, you know, solving problems, I liked putting the puzzle pieces together. And I thought, you know, data science seemed a very good fit for that, um, especially in the more applied aspects of data science, which is what I do. As An applied data scientist, I take great ideas other people have and apply them to business solutions. And so that is, that is the answer to the second question. In the first question, what advice do I have for people who are switching over? One, do your research, there’s this concept that few people really have internalized. It made a world of difference to me, the informational interview, people think of interviews, and they automatically think, Oh, I’m going to get a job, right. But there’s this informational interview where you’re going in to talk to someone just to pick their brain about their job, with no expectations whatsoever. You’re asking someone to share what their day-to-day life is. And I did hundreds of them. I talked to teachers, I talked to administrators, I talked to data scientists, I talked to whoever I could get a hand on to try to figure out what career would be a good path for me. And having, you know, maybe not so many. But like, if you’re interested in being a data scientists talk to a data scientist, if you want to know what life is like for a woman data scientists talk to a woman data scientist, I found that even cold calling people and like sending them LinkedIn messages and being like, “Hi, this is my background, I just want to talk to you and have an informational interview. Do you have some time?” People are willing to talk. They’re willing to share information with you. They’re willing to help each other out. And I do it for people all the time. I’m often, you know, touched by Bootcamp grads, for informational interviews about how you know how you transition out of that. That’s why the informational interview is great. Then networking, related to that, making connections. I heard about my job, my current position at pan data because I met a woman at a Python conference, who happened to know someone at the company who posted Oh, they’re looking for people and I was like, Wow, that’s a perfect job. I applied. And it wasn’t, you know, I wouldn’t have heard about that job. Had I not met that woman who was a girl You know, personal contact, and has been a mentor and occasionally to me at times, as well, um, you know, in meeting people talking to people sharing your excitement about things, that’s how you know, you break into an industry, then that that’s very important. I also know not from my personal experience, but from others who are self-taught. And data scientists, your web presence is important. Make a website, show off your software, show off your algorithms, show off your skills, do 100 days of code, just have a web presence, be out there be seen, that’s important. So those are the three tips that I would have for someone who wants to transition.

“People are willing to talk. They’re willing to share information with you. They’re willing to help each other out.”

In terms of building a presence, what kind of technical skills would someone need to do to promote themselves that they know all these things about data science? What would you want to see? In terms of technical skills? Or even soft skills?

We’ll start with the soft skills. Soft skills are so important. People don’t realize how important it is to be able to communicate with clients to communicate with people. When you’re a data scientist, you’re talking to people who are in the VP level CEO, the C suite, you’re trying to help them make decisions based on data that you’ve analyzed through the algorithms that you’ve produced. And you need to be able to communicate and communicate well. So to communicate with an existing communication with an executive means get straight to the point. Don’t be super wordy, in written form, or bullet points. I remember learning to write it for business and like the bullet point was hammered into me, that is your friend. Learn how to have a conversation with someone that you don’t know. Which is really hard. Like even if you start making small talk, you know, building those skills, making small talk at the grocery store, or when you go out. Just learning how to talk to people that you’ve never spoken to before is an important skill to have. And then working with teams, being collaborative and cooperative are important skills to have. And in terms of technical skills, that’s a big question, because it depends on what kind of data scientist you want to be. Do you want to be someone who specializes in cloud architecture, and then that’s a different set of skills that you need to develop? If you want to be someone who does research-based machine learning, then that set of skills applied to machine learning is another set of skills. So really, understanding what kind of position you want to be in, what kind of data scientist you want to be, is important to understand what kind of skills you want to want to develop. I focused on developing Python skills, and SQL skills. I know a little bit of it but I’m not very proficient in it. But for me, it was Python and SQL that were the main things that I focused on developing for myself. I know other people who want to know Python but also want to build their own website from scratch. So they learn HTML, CSS, JavaScript, that suite of materials. They also wanted to learn how to use it on their site and speed up their algorithm design. So it really is what you want to become at the end that tells you how to get what you want to become, as a data scientist helps drive what you want to learn.

When you got your first data science job, what would be your number one piece of advice yourself?

If I could go back in time and tell myself to give myself any advice about getting that first job. You know, I would tell myself to breathe. You’ll get it. It’s overwhelming. Even my first job after a boot camp. If they get you part of the way there. The job takes you the other 20–30% I’m in terms of the knowledge base. And I remember being there being asked to do NLP processes, I was a Bootcamp student, four months into my boot camp. We hadn’t covered NLP processes. And the first project that I was assigned was an NLP project. And I like they were, I remember my supervisor was going through a list of concepts, and to explain what was going on with the algorithm. And I had no clue what she was talking about. And she says, we’ll write these words down, and we’ll look them up because you’re going to have to at least have a basic understanding of what they mean to be able to work on this algorithm. So that would have freaked me out. I felt so dumb, frankly, I felt stupid. I would tell myself, you’ll learn it, you’ll get there, and you’ll grow. And I went from a junior data analyst to a data scientist in a year. Give yourself permission to breathe. It’s okay, you will learn it, you’ll get there is what I would say.

“Give yourself permission to breathe. It’s okay, you will learn it, you’ll get there.”

How did you juggle Bootcamp and finding a job?

Juggling the Bootcamp and finding the job was tough. Really, what it comes down to is finding time to go to meetups, professional meetups, like this one. And talking to people. That was the start of going to conferences, local regional conferences, and the field that you’re focusing on or in the language that you’re focusing on. That was a big help. My advice, anytime anyone’s talking about doing multiple things at the same time, like class, and work, your calendar, you live and die by that calendar. Keep that schedule, make time for the people that you need to make time for, and make time for yourself and make time to do these things. And it’s tough, but live and die by the calendar.

And what are some questions you suggest asking mentors?

How did you get started in this field? What would you tell yourself? If you were doing it over again? What do you like about your job? What is an average day? How would you suggest someone break into this field? one conversation to have is after you secure the person who’s your mentor, have a discussion about what the expectations are, you know, how often will you meet? What kind of mentorship are you looking for? Those are the kinds of questions I would ask.

Any last words of wisdom for those in Data Science positions?

This is something that I learned early on in my professional life. And it applies to a lot. The problem with wisdom is because we are human and we fail, right? We’re not perfect. There is no such thing as good and evil. There are just people trying to be decent, and try to be that. And I think that that is by far the biggest pro wisdom I’ve ever had is just try to be decent.

This blog is the transcript of an event run by the Women Who Code Silicon Valley on April 17, 2021. You can view the recording of the event below.

Ethics in AI and Machine Learning by Julie Novic

View and register for upcoming events by our chapter at http://bit.ly/siliconvalley_events

Want to hear more about our #ShoutoutSaturday series? Follow our Official Blog for the WomenWhoCode Silicon Valley chapter!

To get more updates about this event and our series, follow our social media platforms at linktr.ee/wwcodesv

--

--

Dianne Jardinez
WomenWhoCode Silicon Valley

Leading the effort on the #ShoutoutSaturday blog series for the WomenWhoCode Silicon Valley chapter. Join our community at linktr.ee/wwcodesv