DIGETHIX: DIGITAL ETHICS TODAY — EPISODE 11

Shared Computing, Data Analysis, and Boston University’s Computing and Data Sciences Unit

A talk with Azer Bestavros

Published in

DigEthix

47 min readSep 28, 2022

“This whole evolution of us as humans and society, with the computer, the computer also evolves because we are changing the way we work with it.” (8:45)
— Azer Bestavros

Edited by Carli Johnston

DigEthix is a podcast that looks at issues in the ethics of technology from a philosophical perspective. This article follows a conversation with Dr. Azer Bestavros about his contributions to projects involving both public and private entities, a description of mathematical methods, and his insight on how to analyze data without compromising its privacy or security.

This conversation explores these central questions:

How can computational methods and technologies be used to address social challenges?
How might data be integrated into other academic disciplines?
What role do computer scientists have to play in larger research questions?

(Jump to important links here)

Click the link below to listen to this episode:

About the guest:

Dr. Bestavros grew up in Egypt and attended Alexandria University. Later in his career, he went to Harvard where he continued to explore computer science’s evolution and how it can be studied from different perspectives. Throughout his studies, he has realized that computers are much more than theoretical issues or the mathematics behind them. Today, Dr. Azer is the Inaugural Associate Provost for Computing and Data Sciences at Boston University, a William Fairfield Warren Distinguished Professor in the Computer Science Department, and the founding director of the Hariri Institute for Computing.

Dr. Bestavros is passionate about how his technical work in engineering and computer science is closely linked to the care of society and its development.

Azer Bestavros

Azer Bestavros Azer Bestavros is the inaugural Associate Provost for Computing & Data Sciences and the William…

azer.bestavros.net

“I think computer science to succeed needs to think of itself also as a social science. We are certainly a mathematical and an engineering science, but unless we have this empathy to think, with lawyers, with social scientists, with humanists about how to get the technology to work, we will not come up with the right solution.” (29:09)
— Azer Bestavros

Transcript

Seth Villegas:

I’m very pleased to say that I had a chance to sit down with Dr. Azer Bestavros. So, Azer is the Inaugural Associate Provost for Computer and Data sciences also known as CDS, which will we hear a lot in this interview here at Boston University and he is also the William Fairfield Warren Distinguished Professor in the Computer Science Department, which he joined in 1991 and chaired from 2000 to 2007. He is also the Founding Director of the Hariri Institute for Computing as he will share in this interview, he has worked on several major projects involving both public and private entities. Looking at mathematical methods for analyzing data without compromising the privacy or the security of the data being examined.

It is hard for me to overstate just how important as our support and influence has been for getting this podcast off the ground. I first met Azer about a year and a half ago, when I began working as a summer research fellow for CDS. During that summer, I actually developed several case studies cataloguing incidents that happened from the past several years involving technologies and tech companies. I also prototyped a few different frameworks for doing ethical analysis of situations involving technology specifically, I really do feel like the work that I did that summer has laid the foundation for so much of what we’re doing now on this project here at DigEthix. So I’m really grateful for the opportunity to do that and to Azer specifically for helping me to do that.

So, in this episode, Azer and I will be talking about how he initially got involved with computer science, and the development of important topics in the field, such as massive shared competing systems in privacy-preserving analytics, as it will explain, the mathematical and technological problems are sometimes easier to solve than the social challenges of trust and adoption. With that in mind, the key questions for this episode are, how can computational models and technologies be used to address social challenges? How might data be integrated into other academic disciplines? What role do computer scientists have to play in the analysis of larger research questions? This podcast of course would not have been possible without the help of DigEthix’s team: Nicole Smith and Luis Salinas. The intro and outro track Dreams was composed by Benjamin Tissot through Bensound.calm. You can find more info about DigEthix on our website, DigEthix.org. You can also find us on Facebook and Twitter @DigEthix and Instagram at @DigEthixfuture. If you have any questions or comments, you’re always free to email strictly digethix@mindandculture.org. And now I’m pleased to present to you with my conversation Azer Bestavros.

(3:11) Thank you so much for taking the time to talk to me today Azer I really just wanted to start with a little bit about you. And in your story. If you could tell, tell me a little bit in your own words about your journey through computer engineering and computer science, especially when you’re getting educated say when you’re an undergrad and kind of through your Ph.D. program?

Azer Bestavros:

Well, that’s a question that can take a long time. I have to say, of course, you know, my background, I grew up in Egypt. And I went to Alexandria University. My father is a lawyer and so was my mother, who dropped out of college sort of years into the program, (3:51) because I decided to come into the world. So I think a little bit opening up for that. So we don’t have actually engineers in our family except for an uncle of mine who studied mechanical engineering and actually immigrated to the US back in the ‘60s. And he would come semi-regularly around Christmas, maybe every other year or so to visit with family. And he was always impressive just when I saw him. So I sort of liked engineering as an area, but I had no idea what to do. Growing up, I thought I would be an architect because I really liked drawing and painting and visual design. But when I was trying to decide, you know, he was visiting one of these times, and I asked him so well, you know, what should I look forward to in the US? What are they doing? And he said, Well, you know, there’s this new field computer science and it’s really hot. And that’s where I started knowing anything about what computers are to start with.

(4:50) My dad coming back from a trip to Europe to evil oil, national law. So once he came back with an Apple 2, and that’s sort of was the beginning of my getting into programming and computer science, which I have to say didn’t know what it was until I was in junior or something. So I realized– oh, that’s what science and engineering is. I’d like to stop here and say that this is probably the story of all my introduction to computer science, even though I went to Harvard, I kept discovering, first of all what it is partly because computer science itself changed, right. So when I looked at it back in engineering times, you know, when I was in undergrad and went to Harvard, I realized that I have seen so many perspectives of it. So I started in Egypt with more of the engineering side of it, because that’s the easy part. That’s the part you can read books and understand sort of, well, this is how you build the system. This is how you organize them a computer. And at the time, that was the prevalent maybe view of computer science, because we have to build them, that was a big feat. And then when I came from Harvard, Harvard is very so opposite to Alexandria University, where it was far more mathematical and theoretical. So I realized, well, actually, computer science has nothing to do with the computer itself. It’s really about the mathematics behind it, and, and without behind it. And that sort of was a renewal of my view of what computer science is.

(6:16) Fast forward to even my career at BU, you know, the internet was not anywhere to be found when it started. You got Harvard, and then sort of okay, well, that became part of the good, and that changes what yours are. And today, we think with AI and all the things that we are doing, you know, with, again, it’s starting to be more of a social science, and engineering science. So I feel like discovering what computer science is, is the story of computer science itself.

“Discovering what computer science is, is the story of computer science itself.”

I think computer science is trying to discover this started from its roots in math and engineering, I got an introduction to both of these sort of, with my undergraduate and graduate studies and now it’s sort of really very different, in many ways. So I don’t know if I’m answering your question, but that’s, that’s maybe how I would say it. But, I would emphasize that, you know, there’s a lot I didn’t know that when I started, and I believe there’s a lot that I still don’t know. That’s part of what’s fun about being in academia in particular.

Seth Villegas:

(7:21) Well, thanks for indulging my question a little bit, in part because I’ve also kind of seen some radical changes even over the course of my sort of career from PhD to undergrad. And I think it’s really important for people to kind of realize how new a lot of these kinds of capabilities are, and in which, we don’t really know what it’s going to be like to be turning over lots of these systems to say, machine learning or to AI kind of processes and even hearing you talk about the development of these larger networks, to make up something like the internet that people can use. So for instance, you know, I have to read about in a textbook, like the internet was just the thing between like, say, big universities here and there, right. But it’s almost unimaginable to me, because it’s always been kind of this broad spread out thing. So I think it’s important for at least some like means, and especially like undergrads today, to really sort of keep that in mind. I don’t know how you feel about that as an educator, but I feel like that history kind of situates the context a little bit of what exactly we’re dealing with, and how new it is.

Azer Bestavros:

(8:24) It’s a cool evolution. Computer science itself is evolving, but actually society is evolving. Now it is computer science. So you introduce a device, and then people start using it. And then they use it in a way that you didn’t expect. And then you don’t because of that, you change the device itself, right? So there’s this core value, and then we start developing capacities that we didn’t have before because we have the device, right? So I call it the device, I mean, the internet is a device for like care and just much, much bigger scale. So this whole evolution of us as humans and society, with the computer, the computer also evolves because we are changing the way we work with it.

I think one has to be a little humble, to make predictions, because I don’t know, if you had to ask me 10-15 years, you know, would we have any of the problems we’re talking about today, and I wouldn’t have expected that. People want to design technology, especially engineers. Actually, this is an interesting point. As an educator, I always try to tell it to my students. I say, look, engineering is about building something cheaper, faster, more efficient, etc. You have a goal, you have a system you want to build, and you set out to build it. So you start with the specs, basically. Then you come up with what you want to build. And then you give it to society and society, you know, will use it in a very different way. You think about the internet that’s exactly it. The internet started this thing that’s gonna connect scientists and so on, then came America Online and the rest is history, a different mode. So we designed it for a very different customer. And then the customer comes in, and all of a sudden, we have to use our email addresses as our ID. And that’s ridiculous, if you had told me that back in the '70s or '80s that email is going to be used as an ID, I would have done something very different, right as a designer, but as engineers, we’re actually not given that opportunity. That’s actually very important, right? We are not told what the system is going to be, we do it and throw it out there. And then we get surprised by how it’s used. And then we’ll try to fix it. Think about security, privacy, all these things could have been totally designed into the system had these issues been an issue, they were not because we were talking with our friends and colleagues and issues of privacy and security were not on our mind.

(10:46) So this coevolution is, is really unique in computer science. You don’t co-evolve when you create, I don’t know, bridges, or create maybe a little bit maybe with cars, but very, very low extent compared to what we’re doing today with the science. So I tend to tell this to students is, from the engineering perspective– and I’m not an engineer, I’m a scientist, but I crossed that trade. I got started as an engineer but I ended up a computer scientist, and now so even more so a social scientist to some extent. You know, if you think of it from the engineering side, we have to really take a step back a little bit, and realize that when we design systems and put it out there for use, we have no idea how it’s going to be used. And we should actually educate ourselves in the ways in which people may use it. And that’s actually– I’m sure it’s gonna come up later in our discussion– that’s why issues of ethics and all come into the picture.

Seth Villegas:

(11:44) Yeah, no, I’m glad you’re already kind of lay in the foreground for our conversation around those things. And I know lots of the conversations we’ve had over the past year or so have really centered around that, you know, kind of anticipating things, but also the ways in which systems do get deployed and kind of really innovative ways that it’s really hard to say what’s going to happen. And this is kind of a final sort of background question– I know, you’ve been teaching for a long time. Have you noticed any kind of changes in the sorts of students who come into CS? Are their expectations different than they would have been say, when you first got into this, you know, decades ago?

Azer Bestavros:

(12:29) That’s a good question. I think that yes, I think the difference is, I mean, computer science also went up and down, it was always popular. So we are now living at a time when you know, it’s lots of jobs, lots of opportunities, etc. But also, when I graduated from Harvard, it was actually a very slow time for computer science, it was the end of what’s called AI winter, you know, where people basically said, “oh this whole machine learning thing is never going to work and AI and all that.” And it happened again, with the .com, right? So I think that maybe originally, the students were more technically, sort of interested in just the programming, the tools and so on. I think, more recently, it’s really about empowerment, in that students who go into computer science feel they can change the world– whether they change it with technology, or they change it because they use technology for something else (that’s basically, two sides of the same coin to some extent).

“Students who go into computer science feel they can change the world– whether they change it with technology, or they change it because they use technology for something else”

But I think students who started in engineering, maybe when I started at BU for example, were more, “this is a place where you can get a job”, like going into an engineering field to some extent. And now it’s different. Now the students coming from high school are a little bit more empowered and you see them developing stuff on the side as they’re grads, they are not just going to the classes. They question a lot of sort of established ways of doing things in computer science. That’s healthy, in my opinion, but also makes the teaching more challenging.

Seth Villegas:

(14:07) Yeah, and I think part of the way I’ve sort of seen that in my own life, so you know, I went to Stanford and was in Silicon Valley. So it was, especially kind of during the tech boom. So it’s kind of unthinkable that you wouldn’t be involved with tech at all. So you know, like, I know a few CS languages, just, you know, kind of, as you’re saying, as just a side thing. It wasn’t even my specialization, but just because it was something that I felt like I had to at least understand to be able to have conversations with other people.

Before we kind of get into the bigger issues thought, I would like to talk to you a little bit about your research because I think that it’s really interesting. I think it’s also different from say people’s experience with, you know, their personal computer. I know you’ve worked a lot on say, big shared computing systems. And I guess I was wondering if you could just explain a little bit about that and what those systems are used for and how it is that they work?

Azer Bestavros:

(15:01) Oh, yeah, it’s a good question. I didn’t start working on that, I started back, you know, when I was at Harvard, my thesis was on robotics and was really about how to control robots and so on. This is a good point to bring up the importance of being open-minded about where the wind blows, and sort of, especially as a researcher. I give all the credit in the world to where I ended up for my research throughout the years to a graduate student by the name of Carlos Cunha, who went back to Brazil, and now is leading the equivalent of our AT&T or Verizon call.

I was at the time, years into my tenure track at BU, and I was doing research on real-time systems and in control and basically systems research, building computers. And then he walks in, and he was this new faculty, and I didn’t commit to working with him, I was just discussing things with a class of mine. And he said, “Well, there’s this thing called Mosaic out of UIUC that is very interesting, it’s a browser,” and I had no idea what he was talking about. And he says, “Well, you know, it’s just what’s FTP, and all these things together in ways there’s no right.” So he explained to me the concept and said “Well, I think there’s some research to be done.” And the researcher that he says, “Well, you know, these people are doing something stupid when it comes to the design of caching systems for Mosaic”. And I think, and I would as well, you know, that seems to be just an implementation detail, you know, who cares? He insisted, I’m glad he did. Because the work I started with him, it was just the right time, we were just there on the web browser started, you know, I was doing some work on the internet, but certainly not on the web side of things.

(16:57) Within a year, the work I’ve done with him was getting the highest citations of any of my other work. So all of a sudden that has a way of moving you to work in something different. The good news, obviously, there were very few of us looking into this problem. So we became “pioneers”, the first to do that work. And that sort of moves you right? So I ended up, you already know, my tenure track, this is what I ended up doing. I’m going to get to answering your questions are not answered yet. I’m just telling you how I ended up doing this work, because that’s important. And then the realization of connecting information to each other and distributing content, that’s actually what started it. What if I have at the time, if I recall correctly, there was Bill Clinton’s report, the report that came out of the investigation of his affair with Lewinsky, I think. That was put on the internet, and the internet crashed. Because everybody wanted to go FTP, get the stuff out. And all of a sudden, our work started becoming interesting, which is: how do you distribute content that’s very common?

(18:04) So let’s say right, now there’s a report. And you can store it ahead of time, because that’s just starting, right? How do you do content distribution at scale? The Internet was already growing, but not anything close to what we are today, but still, the idea of having all the computers in the world try to all access one file on one server at the same time. That’s a disaster. So the question is, how do you do that? And at the time, we were there. We started doing work on scaling service. That’s what it is. It’s like, how do you access or how do you distribute content to possibly hundreds of thousands of people? Well, you can’t have a single computer do that. So you need possibly hundreds of computers, possibly thousands. And then you take it from there, and we’re talking now in the mid-90s. You know, it’s not just content distribution, it’s content changes started with personalization with putting ads. So it’s all about touching content. And you cannot have a single computer do that, you have to have lots of computers work together.

And in every one of these steps, I was just a little bit one step ahead. So at the time, when everybody was doing content service and advertisement, I started saying, well, maybe we need to do computation, and that got me into what basically ended up being cloud computing. And that’s basically about how instead of thinking of the computer as maybe a machine or 10 machines in a room, or 100 machines in a big room, think of it as 1000s of machines in basically a warehouse. But it’s still a computer, those 1000s of them, they are just to the programmer, to the users, it is still just a computer. And that’s just the evolution of the field to just get the capacity we’re in. And of course, there is a little bit of a vicious cycle here.

(19:54) When you put a computer at work, it produces data, and then when you have data, you need more computers online to analyse it. There’s a little bit of a positive feedback loop, right? I mean, we all get phones and watches, and these things are just generating data, data generators, and then the data generators produce data and it gets put somewhere. And then people say, “Well, I want to compute on that data, well I need computers for that.” And the scale with which data is generated drive the scheme with which you need to build computers to process the data. So there is a little bit of a positive feedback loop. It’s not necessarily good, it just, that’s how it is. So basically, what is happening out there is just some supply meets demand and supply of information at some point in the ‘90s was somebody putting a file on the server, that’s a supply, and the demand was 1000s of people wanting it. And now we have to do the supply meets demand by designing systems to do that. And then fast forward today with what we do with all the data that we are pumping into fiber. Well, that’s supply and demand is going to be that the people want to leverage this data one way or another.

Seth Villegas:

(21:07) Yeah, thanks for sharing that story. It’s a really funny example. I guess something going viral? I mean, I don’t know if that’s how, how you would have put it at the time? And I think that’s interesting, because if I’m being honest, partially how I got into, say, talking about this sort of stuff, is because there’s kind of a disconnect at times between the people who sort of see the way the systems are or the way that they work, and the people outside of that, who just know that it works, right? You know, it’s kind of ethereal, so to speak. And I think, you know, as we’re using our personal computers we’re not always thinking about what the backend stuff looks like. And honestly, how miraculous it is that we are able to network machines in these ways and get them to do kind of complicated processes. There are instances where say, servers will experience serious shutdowns if there’s enough traffic or depending on what you’re doing. I know this happens a lot, actually, during game launches, I have many a friend who very much complains about those sorts of things.

It’s interesting though, this story that you’re telling about kind of this positive feedback loop of, you have more machines out of which you can generate more data, and then you need more machines to look at that data. I know one of the other things that you’ve talked a lot about is how you maintain privacy within this sort of data-rich environment, especially if you’re doing say, an important audit of something that’s really sensitive. So I know something that you’ve worked on in the past is looking at, say wage data, right? Data that’s extraordinarily sensitive, but offering to go in and look at lots of data, but doing it in such a way that kind of protects the privacy of the individuals involved. When you’re approaching something like privacy, is that simply a matter of A kind of a mathematical problem? Are there other issues that kind of come into play? What’s the right approach for thinking about how that actually works as a kind of technical issue?

Azer Bestavros:

(23:09) Mathematics is going to be there, but it’s not what’s started is something totally different. And this is another example of my work on privacy and how I, you know, I started by talking about content distribution, that’s how I got into it, and then cloud computing. So how did I end up doing some privacy work? Related to the notion that there is data that you need to analyze? So okay, another thing about this notion of internet and large-scale systems, is data also becoming large-scale, right? So if you think about, let’s say, hundreds of employers in the city of Boston, every one of them has a database of their employees. So you have large-scale data, but the data is not one single owner. So if you go back to the 80s, you have a floppy disk of some sort and that’s, that’s what it is, it’s yours, you have it in your hand. And if I want to give you the data I make a copy of it, right? Fast forward to you know, what we have now. Data has owners, and the owners do not want to share it, it may be very sensitive. And it’s at a scale that you cannot copy, which think about this, again, you just can’t copy the data. Google has services, if you want to move your muscle data, they actually put it on a hard disk and you ship it by UPS: that’s faster than trying to put on the internet and suck it up, and they have this service. So literally, it’s faster to move data with airplanes and trucks than to use the internet for it because it’s so huge that it can’t hold it. And certainly, you don’t want to copy it because there’s too much of an overhead.

(24:48) So if you think about the scale of data and how data now is owned by different people, and so on. The question is, from a perspective of thinking about all this data, as just one file, and thinking about I just want the computer to process it. That’s wrong knowledge. Because it’s not one file, it’s a bunch, let’s say 100, and each is databases. The question you want to answer, maybe one question, “what is the gender pay equity, or inequity, between men and women, or minorities?”, you can think of it in the abstract, if somebody gives me the data, I can compute what I want. And the problem is like a sophomore can do it. But the problem is that that’s not how data is, data is not going to lend itself on your desk. So you have to design systems that take as input the constraints put on the data. One of these constraints is privacy, but it could be others. One of them is literally scale. One of them is “I can’t move all the data that’s in place”. Privacy prevents actually moving the data in one place because nobody wants to give you their data to start with. Given these constraints, then you will write your system, right? You will write your solution in a way that recognizes the challenge that was supposed to be met. And that was interesting, I never thought of privacy as a constraint. I always thought about scale, I told you before content distribution, you know, the Lewinsky report and how there were dozens, hundreds of 1000s people on to access a single file. It is all about performance, it’s all about media security.

(26:23) Privacy showed up almost by accident again, when I was at BU and and there was a former mayor of Boston who stepped down and I got to know the challenges he faced in trying to do a particular study to do wage analysis. And the issue was that with the data, I didn’t realize a privacy was an issue. And then the interesting piece here is they thought it was not solvable. Because that’s the constraint. And you know, nobody can imagine getting data from companies that has CEO and officers pay and stock options and everything else, even if individuals to contribute the data, the companies don’t, because that’s a trade secret. That’s just something they don’t want to do. So it was a no go start with.

And this is where the math comes in. So I remember, wait a second, when I was at Harvard, there was this algorithm that I was taught by my (grave and) at the time, I think there’s a way to do that. Now the mathematicians started proving that it could be done back in the 70s, that’s how early it was that they saw there’s a way you can do that. But you know, the mathematicians that did theory, they proved the theorem, right? Fast forward 20 years later, and here comes the application. The lawyers have no idea actually, CEOs have no idea, actually nobody, even computer science at the time, even bothered to talk to them. I did, that was the accident, that was the lucky accident that got me to talk about the problem and then realize, wait a second, there is a way to do that. And now I had to spend a year convincing the lawyers, the owners of the data, that it can be done, because it sounds like magic. I was like how can you compute on data that is never shared never leaves the confines of a single owner of the data?

(28:15) And this is actually going back to your question, you said the math. Yes the math was how to solve a problem, but it was convincing the lawyers that took the longest. And you know, the lawyers, I don’t blame them, they actually did exactly what they had to do to make sure that they kicked the tires that to make sure that they’re not going to be in debt, that actually there is no way on earth that the data could be reached, or for that matter seen by anybody other than them. Which I had to assure them, “even though I’m doing the computation, I actually have no idea what the data is and I can prove that to you with math”. Now, that’s what took a year, and honestly, I think that’s what changed me. And this is, from that point, honest, I think computer science to succeed needs to think of itself also as a social science. We are certainly a mathematical and an engineering science, but unless we have this empathy to think, with lawyers, with social scientists, with humanists about how to get the technology to work, we will not come up with the right solution, there is no way. And I can go on and on this, whatever, because I did spend a good portion of my last six, seven years working on this. So, a lot of data but you know, design, how to make sure that the people that enter the data also don’t change the world for them because they have other other things to do. So the ease with which you introduce technology like this, which is fairly sophisticated on the mathematical side of it. But believe me, the biggest challenge was not technology, it was the people.

Seth Villegas:

(29:48) Yeah, and one of the things I actually really want to highlight here is the kind of analysis that you’re talking about doing especially around something like inequities in wages due to sex and gender or race. It’s extraordinarily sensitive and I can imagine that almost no company would want to go through an audit of that kind. But your introduction of this kind of technology maybe allows for a kind of analysis that wouldn’t otherwise be possible because it affords some sense of protection. While, I do think that some people really want to kind of, you know, hone in and really break doors down and everything like that, I appreciate what you’re saying about a kind of sense of trust and cooperation that allows for a new kind of analysis to happen, so you can actually kind of get into the details of the specifics of the problem.

Azer Bestavros:

(30:42) Actually, you bring up a very good point, which I want to discuss, to bring up, I hinted to this just minutes ago. Folks with real problems that really affect society, don’t know what’s possible with technology. Actually, they make assumptions. So when I told you, they said, it could not be done, that was an assumption, they never asked me, it was only when they asked me says, well, it can be done. And, why? Because they have every right to make these assumptions, because the world in front of them was working in a particular way. If I want to do analysis, I can roll up my sleeves and do R whatever statistical package I can use, but you have to give me the data, and if the data is not going to be available, there is no way to answer it. One thing I realized after doing this work and started talking to social scientists a lot is that the first thing I tell them is ignore, forget the fact that you don’t have the data, tell me what question you want to answer. Just the question, independent of whether or not you have the data. Just tell me it was like we live in a nirvana now, this is going to be amazing. Just tell me what a question you want to ask. And now we can go together and figure it out.

(31:54) First of all, maybe there is a way to do it, like I did with (WWE). But even if not, there are ways to have proxies for the data you think you don’t have because, for example, a favorite of mine, we can use traffic data about where cars are and speeds, and so on, as a proxy for pollution due to co2. And this other research I happen to talk about. So this is a good example of how data can become a proxy for other data that you may not have. And then we can answer the question and then we can test the hypothesis. And so what I feel is that collaboration helped me realize that social scientists actually self-censor, that they should not ask me the question they should be asking, simply because they think it cannot be answered. So I feel like that’s, that’s quite important, and excites me. And going back to your point about the sensitivity of the data and so on, I just want to be fair to these companies, they did want to help the mayor figure out the signal, the mayor was not interested in making one company look better than the other. That’s not the point. The point was, can we have a scorecard for the entire business, everybody who employs people in Boston, and we can compare it here to see if we’re making progress, and we do that at an aggregate level for different companies. So there could be a social good that comes out of analysis that everyone is interested in having. And we can do that without risking the privacy issues, or in this particular case, it was privacy, but also really confidentiality of information at the companies.

“There could be a social good that comes out of analysis that everyone is interested in having. And we can do that without risking the privacy issues…”

(33:36) So this is actually another piece which I learned, which is, individual data may actually not be private. But when you compute it in a way that affects the community, the community may want to protect it. So there’s an interesting example with the P-equity. The data itself was very confidential per company, every company wanted to hold on to this data and didn’t want to share, but the aggregate is social good and everybody wanted to do it. It’s often the times that there could be also other examples where it’s the opposite. The data itself is available and it could be made public if you ask for it. However, if you compute on it, and you compute something that affects the community, for example, I’ll do homelessness and parts of the city. You know, if you come up with results as well, you know that you have a big problem in this part of town versus this part of town, real estate value on this part of town will suffer and then the owner of houses would complain. So even though the data itself may not be private, when you use it to compute something that affects communities, you really have to be sensitive because the communities may be adversely affected by the aggregate computation. What do you call that? It’s the privacy of a community and it’s a privacy or community if you are going to get, you know, results at that level. So privacy becomes this thing that can be at the individual level as individual people in the companies but it could be communities which are defined however the community define itself. So I feel like this is one example of the importance of overlaying this with students with, “What do you mean by privacy?”, “ Privacy of who?” And privacy of who depends on the question you’re asking. So it becomes very so intertwined in a way.

Seth Villegas:

(35:22) Yeah, I just love your orientation, and kind of sensitivity towards the targets of the analysis and of the data. I think that that’s really important, especially if you’re asking these sort of social questions, and even what you’re saying about research and questions. It’s funny, because the way I’ve usually seen this, it seems kind of backwards, the ways in which a lot of times we think about this in which there’s kind of “Oh, like, let’s see what we can get data about”, but the data doesn’t have a direction necessarily, right? It’s just a matter of, well, we can gather it, we could have it, it might be useful, but there doesn’t seem to be an actual sort of question there. And I just love what you’re saying about that because in these kinds of social science situations in which people don’t bother to ask the question, because they haven’t actually imagined a solution, right? They haven’t imagined a way in which to get it.

And this actually reminds me of, you know, Wesley Wildman, he’s my advisor, he’s how we know each other. He, often asks us to think about things in that way. He usually thinks of it as problem-oriented research, in which there is a particular problem, and then we can figure out what we need to do in order to be able to solve it. And it would seem that that’s actually one of the few ways in which we can get out these really important kinds of questions. It is really interesting, again, in this sort of wage inequity situation, which perhaps people are actually invested in solving the problem if you could find a way to do it. Right? So you have this question, you kind of invite people into that question. And then you then kind of help the community to start to find ways to solve it and I think that’s just a great approach.

Azer Bestavros:

(37:13) Yeah, data driving the questions to be asked is the wrong way to do it. As a matter of fact, I think that data should be hidden from the researchers. Just ask the question, and throw it over to somebody who uses data to give you the result of your hypothesis, you know, whether it’s, “oh, yeah, you’re right, there’s a correlation” or “no, there’s no correlation.” This is something that I feel, I mean, going back to the privacy question, and by the way, this is something, that I have long discussions with social scientists about. Many of them don’t like this idea of not seeing the ground truth, because the whole point of the gender pay equity is actually you don’t see the ground. So from their perspective, as well, but I cannot formulate a question until I see some patterns, and then I can. So in other words, they want to look at the data and say, “Look, isn’t that what happened in biology, and that would happen in physics, you see something, and then you come up with a theory to explain what you saw.” I think that’s the perfect way to do research in natural sciences and physical sciences, and so on. But I think when it comes to society, I think I really worry about the bias, the experimenter bias and so on.

When I give you data, you’re gonna find things in it and then say, “Aha! Look, here’s my hypothesis.” And my point is, this is really not the right approach. The right approach is, to have a hypothesis about whether it’s this or this or that, ask it, and let the data validate it. Now, you know, these things can go side by side, maybe I give you a little bit of data so that you can form your opinions. So I feel like this is a very interesting question: when data drives the question, I think this is where bias creeps in. Because if data drives the question and your data is bad, then you come up with very bad conclusions. So I feel like it’s, this is a this is an area that we’re nascent of what is data-driven research and how to do it the right way? It’s not I don’t have an answer, but it’s…

Seth Villegas:

(39:18) Yeah, definitely. And one of the other things that can happen is, what if you have a bad signal? What if you have things that just aren’t there?

Azer Bestavros:

Yeah, an anomaly.

Seth Villegas:

Yeah, what if you created a site, I mean, this is something that we actually talked about in our lab, a lot of just kind of messing with the data to create a signal that’s not there. And these things are all problematic, especially if you’re trying to get like a so-called objective picture of what’s going on on the ground.

(39:49) So I kind of want to turn a conversation to the faculty of Computing and Data Sciences, in part because I think a lot of things we’ve talked about so far, one of the ways in which I think that BU is trying to address this is through CDS. So could you tell us a little bit about what CDS is, and why it is you’re kind of spearheading it now?

Azer Bestavros:

Okay, so I’ll answer the two questions. There’s a question of why now, maybe why BU, then there’s a question, of what is it– I’ll start with what is it. So CDS is an academic unit, and an academic unit is a unit that can appoint faculty and can offer degree programs. Typically, in university, these functions are done in departments, which are in colleges, which has a bunch of colleges and schools to do that. If you think about it, that organization of the provosts, deans, of different colleges, departments within each of these colleges or schools, and then chairs within these is the definition of hierarchy and siloing. Then, how universities traditionally have done interdisciplinary work is to create centers of research that connect faculty from different departments together, but it’s usually our own research. Faculty is always still in their departments, but they come together in interesting ways. And, for example, in my area view, started something called the Hariri Institute for Computing, I led that in 2010, and that was a research center and continues to be very helpful.

(41:30) The problem with that is just obscenities, also saying, well, it’s being faculty together to do research, connect, I don’t know biology, with chemistry, and we’ll do something so that section. Beyond research, think, even where computer science and data science is, is if you let you have to do that also an education. And that’s a function that is totally usually done through these academic units. Right? So then the natural next question is, well, if you want to do something on data science, then who should own it, where should we put it? And here I’m going to start by giving you credit, because around the country, the model is, “well just create a College of Computing, and put everything under there”, or “take the computer science, the computer engineering piece, take it out of whatever it is, whether it’s College of Engineering or College of Arts and Sciences, and then create a college of our own”, or create something like a division, which is similar to a research center, except that maybe you can offer the program, but the faculty that I still sit in their own departments. Now, all these are fine models, except that every one of them does not do one thing, all of them don’t do one, which is to create a culture, that is a transdisciplinary culture, where people actually are not feeling like second-degree citizens outside of the departments.

So this is where BU says, “we can’t just bring all these departments together in creating the unit, because then we’ll inherit all the cultures of these existing units. And if you want to create something new, that’s not the way to do it.” And the other bit is actually, there is value in computer science being in Arts and Sciences and engineering being in engineering, and statistics being in Arts and Sciences, and finance in Questrom. There is value and every one of these schools need that unit in it. The BU solution was we’ll leave the departments (I like to call them incumbents), we’ll leave the incumbent academic discipline where they are because these cultures are important, these traditions are important, and deep theoretical research in computer science is very important. Leave it where it is, let’s create a new academic that has a different culture and a different way to look at data science. And that will hire faculty and offer undergraduate and graduate degree programs. So that’s what CDS is. And it’s novel in its organization, there is no other university that has done. Now why is it novel? Because we are not part of any college, we are part of all colleges. And this is why my title is not that of a dean, because I don’t lead a school or a college, I lead a faculty and the faculty is really the people that make up the factulty in that field, and I am part of the Office of the Provost, this is my title is an Associate Provost.

And if you think about it from a university organization perspective, the provost's office oversees everything that is intercollegiate. So under the provost, there is an Office of Graduate Studies and also of Undergraduate Studies or Faculty Affairs. And these are basically like, you know, the federal and state government at the level of the Provost who are really interested in what connects colleges together. Because under each college, you have your own Dean and you have your own sort of school, etc. So if you believe that data science is something that actually is a cross-cutting, that is a language spoken at all these different colleges in the right place. And that’s what you decided, that’s what CDS is. And the reason it’s doing this is because it believes and we believe that data science is really a different, highly interdisciplinary field, where everybody should contribute in its formation and in its evolution, I think we started this whole discussion saying that things change. So the only constant here is that things will change. And, you know, it’s hard to imagine disciplinary things that change in academia, you know, physics doesn’t change very quickly, computer science doesn’t change very quickly, but here we have an area where, we have to be very agile, given the speed with which things are happening. And that’s really what CDS was created to do is to become sort of the place where BU develops programs, hires faculty, at the nexus of very interesting areas. The common part is going to be obviously computing and data sciences, but that’s sort of with its tentacles across, most of the university.

(46:15) So that’s what CDS is, and your second question was, you know, why now? Maybe? I think I answered already, think about all the answers I gave you before about how this cool evolution, I mean, we’re talking about AI, machine learning, and everything is changing because of this. And I’ll go back to something I’ve said, and I’ll repeat. I think as a, you know, an engineer, as a scientist, I realize that, as long for me, you’d want to ask questions that have to do with societies that mean, so at least I want to ask this question with others who know the problems. Actually I have something I always like to say, which is, I lived my life, I look at my life going back. Personally, about half of my career, and a little bit more than half was about my field, computer science, just developing a better computer. That was about me reading systems, as do the internet, you know, and so on. And then sort of, we come to around 2005, after the .com, we had all this data, all of a sudden, the field, itself says, “well, we don’t need to build any computer any faster, we have all this in one. I mean, my my phone is is as powerful as any supercomputer can ever imagine. And it’s so cheap, you can’t imagine being much cheaper than this, right?” So in a way, computer scientists already have achieved the goal of building the internet, and you know, it can be any more efficient and it can't be any more scalable. We can improve it, but we’re already there. And then we pivoted, we realized that because of all this data we have, because of all these devices, we now we have to start solving problems that other people have, not our problems. So building this was our problem. That was something that, you know, computer scientists have to do operating systems, nobody else can help them with that. But now that we build the devices, we have to change totally and flip to answer other questions, and this is where CDS comes into the picture. And now is the time that’s just happening. And it’s happening, because so many disciplines are changing how they whether it’s professional, or science, or humanities even I think. So anyway, long answer.

Faculty of Computing & Data Sciences

Our Faculty of Computing & Data Sciences (CDS) is a transdisciplinary, degree-granting academic unit that lives outside…

www.bu.edu

Seth Villegas:

(48:42) No, thank you for giving us that. And I guess it’s two other things I wanted to highlight that CDS is doing is first off just having a bit more of a broader emphasis on, say, ethics on the kind of social aspects of the kind of work that they’re doing. And on being able to use their discipline with people from other disciplines to solve these kinds of bigger problems. And I think that that’s really novel. I mean, people kind of understand that it might be possible, but it seems like one of the goals of CDS is allowing for a new kind of academic work to happen between people who can say work on teams on these bigger sorts of issues. And as kind of a final question for you. What do you kind of hope that a person who goes through CDS, what kinds of skills does that person have? What kinds of orientation does that person have?

Azer Bestavros:

(49:47) Yeah, I mean, it’s hard to answer that question. I’ll just say that, you know, we don’t have a sort of a canonical for what kind of person we expect to you partly because that got us into trouble with computer science, right? So when you think about who’s the canonical computer scientist going to be, i’s the kid who was, you know, in fifth grade was writing code and doing games and hacker mentality, and so on. And that’s not good, because it sort of colors the field, and as a result, you end up not being as diverse as you want, and you don’t bring in people who can have amazing impact because, you know, their brains are wired differently, and they have life experiences that are different.

So one of the things I always say about CDS is, because we are starting to some extent, from a blank slate, this is important, because we can reset expectations about who ends up being there. I’ll give you one specific, we just launched our undergraduate degree, starting small this year and then starting next year, we’ll have our first incoming class. Typically, when you think about STEM area data science is, usually you start saying, “Well, you know, you’re expected to have taken calculus in highschool, and take AP credits, left and right. And typically, all the computer science students we get many of them have it. So you miss out on the kids who hated calculus, because they had a bad teacher, or maybe because they didn’t do well in high school or hated programming, because, you know, they saw the kids doing that as sort of not the kind of kid they want to be.

So our approach is really to start with how data science can be empowering. So we’re starting with courses where it says, well, very quickly, we’ll be able to put things together and see how data can affect you. We’re introducing a new class called “Data Speaks Louder than Words”, and the point is, well, then you can in a semester, get students to appreciate how data can be and answer questions relevant to them. Now, if we’re successful, then of course, like that, and get somebody who never thought of themselves in data science. They can say “Wait a second, this is cool. I can do this.” And then introduce them to the math and what is needed, but by hiding it. We actually don’t call it calculus, we don’t have calculus in our program, but it’s there, we have to talk about probability, calculus, algebra, and these things. But we integrate them in this sort of sequence of courses where there is this positive reinforcement like to use it, and very quickly, you’re using also Python at the same time to do something that that shows you the value of calculus, or shows you the value of linear algebra, or shows you the value of regression. So these things are not taught as the prerequisites to do machine learning, which is how everybody else thinks about it, “oh, you have to do these five courses in order to be able to start thinking about algorithms and machine learning. Go do it and then come talk to me.” But, no, no, you can talk machine learning from the first class, and you can start doing it. And every time you do it, we can show you why you need the next level or need to learn something else.

(53:01) So this kind of approach to developing data science is also pedagogically very different, because we are hoping to attract different students. To answer your question, rather than canonical type of students our students do, they may take one class, and that’s it– great, they got their Hub credits, and they are happy with their with that one class they took, but maybe that might whet the appetite to take the next class, maybe that leads them to a minor. But maybe they become majors. Right? Now, I also think that the opposite can happen. Somebody comes in, says “I want to do data science”, and they take a few classes. And then the second or third class, they take is on ethics, and say, “Wait a second, this is great. Maybe I want to minor in data science and major in philosophy,” whatever it is they want to do, right? So I always like to say that CDS being this cross-cutting unit, I also want the programs in it to act as the sort of ramps and exits. And people can come in –it’s like a highway– come in from other fields, you may have bumped into us because you take a class or this or that. And then you measure and it’s like a ramp into the major or exits, you start doing it and then decide, you know, “I really want to specialize in computer science.” Great. So go get in computer science, or political science or whatever else. So I’m not giving an answer. But my answer is, if you think about data science as this really, truly interdisciplinary field, then I shouldn’t have a single point of view of what the students will look like.

Seth Villegas:

(54:34) Yeah, I really appreciate you saying that. I’m hoping that people who consider CDS will say that. But I will throw out one thing though, that I think that people will hopefully experience which is, “Hey, like I have, you know, maybe they have a driving question already”, and “hey, this might be really useful to me. I didn’t see myself doing this,” and I think this is actually part of your story, but “I think I’ll pick some of this up because I think it could help me or it could help me to work with someone who can and help me further.” Honestly, I think that kind of growing room for collaboration like that, or for people to kind of pursue questions in that way, you know, kind of this sort of big picture stuff because especially if we’re talking about undergrads, right, they have all these, they have all sorts of questions. Actually, when I teach graduate students, they have far less questions than the undergrads. The undergrads are the ones asking all the off the wall things, and so that would at least personally be my hope, as they come to CDS, right, is that they would sort of see that.

Azer Bestavros:

(55:28) Well, I totally agree with you, let me share with you anecdotally. So we are now open for intra-university transfer. So this is students who may have clarity on what to do if they’re in the College of Arts and Sciences, or COM, or one of the colleges, you know. Now we want to allow some of these students if they want to declare a data science major to move into CDS. We asked them to write a little essay, like “why do you want to do that?” because these students are probably going to be sophomores or maybe juniors. And you know, just because it’s new, because I know I’m interested, I asked my assistant, “hey, you know, share with me some interesting essays.” And some of them are very short paragraph or two, but there was the students who actually wrote long essays, like two or three pages, like “I really care about the environment”. And, you know, this is an example of the students who sort of was thinking of themselves as maybe doing something at the intersection of statistics, or data science, or earth and environment. And they feel that this is the way they really are environmental as far as what they want to do an undergrad. But they saw that the data science gives them a far better tool and sort of preparation for what they want to do beyond that. So I agree 100%, this is really about data science as the sort of common language in all the different disciplines. And it’s, it’s what the students want to do.

“[CDS] is really about data science as the sort of common language in all the different disciplines.”

I’ll say something else about CDS also, which is, in a way it’s democratizing access to the technology. What am I saying here democratizing access? Well, when I say democratizing access to technology, access by whom? It’s actually multiple pieces here. You have access by students, so students coming in, they want to do environment, they should have access to the data science as they prepare for that. So there is the student side of it. There’s also the employer side, which we didn’t talk about. You know, if you’re Google or Amazon, you get to pick, you pick the best graduates, you know, everybody applies there. But what if you are a nonprofit? Or you’re the CEO, boss, or you are you name it, like, you’re not this shiny technology, Silicon Valley company? Don’t you think these people have huge need for data scientists? What about education? What about people who do social work? Right? So they have a hard time. And I talked to corporate, as well as nonprofit, and they say, “we just want to hire in that space, but nobody wants to come and work for us.” The question is, how do you create this new career paths? Right? How do you get somebody who actually is passionate about working in education, but does data science, right? So if you put them in a computer science or engineering department or statistics, for very good reasons, they will get snatched by Google and Facebook, and they will actually be prepared culturally, through the education. through the pedagogy to just be the perfect engineers or the perfect scientists who work in technology. They haven’t had introductions that want to do for things to do social sciences, and ethics, and so on. So I feel like democratizing access to data science, which is this broader field of employers, students, and I can go on and on as it is, there is that aspect about what we’re trying to do at CDS that helps with this.

Seth Villegas:

(58:51) Yeah, and thank you so much for sharing that vision, especially all these things about culture, you do definitely given us a lot to think about. So thank you so much for your time. I really appreciate you talking to me about all this stuff today.

Azer Bestavros:

Oh, my pleasure. And happy to answer any other questions. There’s a lot I see that’s happening, and you would help us spread the gospel. Let’s put it this way.

Seth Villegas:

Yeah, exactly. I mean, that’s partially why we’re doing this too, is because I think that these sorts of initiatives are really important to get out there.

Azer Bestavros:

(59:20) And I think that you know, one last thing we didn’t talk a lot about ethics in the curriculum you get into it, which are quite important. And you know, let me just say that part of this pedagogy, I mentioned something about how you want to integrate, for example, math and programming with what data can do. So, usually when people think of data science there is always the math, there is the programming part, there is stats, also engineering, whatever but then the third piece is the question. What is the question I tried to ask? So we feel like the question should be leading in getting you to learn the math you need and the programming you need all the software engineering or computer science. But rather than think of them as prerequisites or anything like that, it is integrated, so connected. Ethics is a perfect example of that, and we learned the hard way in computer science. Remember I said the internet was designed to connect scholars or friends, and we never thought about security. And we, I mean, I was at Harvard, working on some aspect of this, it studying it, and then came to be you and so on, we never thought of the internet being used except through the National Science Foundation. And that also, then it gets used differently. And it becomes a tool not just for good, but for bad, people attack you, steal your data, ransomware, all these things. All of a sudden, back in the 90s, and 2000s, it became very hard “Oh, we have to teach security,” because our curriculum, the whole pedagogy, designed to teach, you know, organization, operating system, programming languages, and so on. So where is computer security in this? You are teaching all these kids to go write software for Facebook, for Apple, etc., and they have no idea how to write software in a way that cannot be taken down by viruses, you know? How do you teach it?

(1:01:10) The first sort of solution? Well, you have to do a computer security class. Just one class to take it, maybe it’s an elective, maybe they take it because it required– actually, I’ve not seen that course ever been required. And of course, that wouldn’t solve anything. Because that security is not something you had as a class. Security is something you learn the first day you learn to write “hello, world”, right? So all of a sudden, now you when you look now, because for 20 years, that’s what you do. If you security when you teach programming, the security when you do, right? Can you give grades to grade the kids homework, but this code, you made a problem because you didn’t document this or because you didn’t clean up your memory or because you use pointers, whatever it is that you do. And slowly with a lot of programming language, you’re going to be secure, and all that stuff took us maybe two, three decades to get to this point where now if you take a computer security course it’s because you want to do a master’s. It’s what actually we’ve been teaching it all along. So you, you integrated it with the new curriculum.

And I think with ethics we are at a moment like that. And, of course we have an ethics class, but the question is, how is ethics in everything we do? I hope it doesn’t take us two decades to get to the right pedagogy here. But I think that that’s really important. And how do you do that? It’s not straightforward, right? So I, I wanted to highlight this because remember, it’s not about, I cannot hire a philosopher to teach in a class that’s using statistics. That’s teaching about statistics, but I actually do need ethical consideration to be part of what you hear in a class that talks about programming or statistics and so on. So that’s a challenge, because you really have to teach the features to some extent. So how do we create that community where ethics is almost like, you know, somebody teaching operating systems has now to teach security? Well, it was hard to do 20 years ago now it’s not. So the question is how to get to a point where things like ethics, which are very important, simply because of now, data science is really about, you know, how it affects society and health. I don’t think we’ve talked about this, but I think it’s really important. And it’s yet to be answered questions how to do this.

Seth Villegas:

(1:03:32) Yeah, I think that’s an important backdrop for why we’re having these kinds of conversations. For why CDS is really trying to put that at the forefront of the program. And so I’m really grateful that you kind of summed that up nicely for us. And well, it sounds like we’ll have to keep having these kinds of conversations around BU and I’m looking forward to seeing how CDS develops. I look forward to it.

Azer Bestavros:

I look forward to it!

Seth Villegas:

Thank you for listening to this conversation with Azer. If you have any questions or comments, you can reach us on Facebook and Twitter @DigEthix and on Instagram @digethixfuture. You can also email us directly at digethix@mindandculture.org. In this episode, Azer takes us through many important developments in the history of computer science and engineering. He also talks to us about why something like CDS, Boston University’s new initiative and computing and data sciences, is necessary. In part, data is a unique kind of thing that has many applications. However, Azer warns us that we can be misled by data, even though we can also gain insights from it. Given the potential sensitivity of certain topics of investigation, such as those involving salary data and potential salary disparities, how can we pursue a fair and ethical investigation of these topics? Azer proposes that it might be necessary to separate the question asker and the analyzer of the data in order to get the best research. I know that Azer and I both have really high hopes for CDS, and we hope that we can become a place where people can do both ethical research and research with important societal implications. I hope to hear from you before our next conversation. This is Seth, signing off.

DIGETHIX: DIGITAL ETHICS TODAY — EPISODE 11

Shared Computing, Data Analysis, and Boston University’s Computing and Data Sciences Unit

A talk with Azer Bestavros

About the guest:

Azer Bestavros

Azer Bestavros Azer Bestavros is the inaugural Associate Provost for Computing & Data Sciences and the William…

Transcript

Faculty of Computing & Data Sciences

Our Faculty of Computing & Data Sciences (CDS) is a transdisciplinary, degree-granting academic unit that lives outside…

Important Links

Written by Center for Mind and Culture