The Future of Data Science — An Interview with Sarah Catanzaro

Ben Newton
Mar 18 · 7 min read

In this episode of the Masters of Data podcast, I speak with an emerging leader in the spaces of data, human and venture capital. Sarah Catanzaro is a principal at Amplify Partners, a venture capital firm that specializes in early-stage companies innovating with machine learning and AI. Sarah helps guide founders and innovators because of her incredible expertise built over years of using data science to innovate for both the private sector and protecting US national security. The two sit down to review a number of elements linked to data and AI, including how many people have perceived the outcome of the big data movement, how data is often used as a weapon rather than an asset, the role that bias plays in today’s technology science climate and the importance of diversity for growing teams. Sarah also candidly shares about her experience as a leader at an expanding VC firm while being a woman-something that has not come without its own unique challenges, but also, benefits.

To kick off the conversation we discuss a little more about Sarah’s background and how she has found her way to her current role as a leader in the data and VC space. As Sarah explains, she started her path in the defense and intelligence sector to learn more about why people show violent behavior and the reasoning behind why terrorist organizations form and advance. “What I found was that the statistical approaches could be used to get better insights into these organizational dynamics, so right after college, I ended up going to the Center for Advanced Defense Studies, and there the focus of the research program I was directing was on computational models of adversary behavior.” From here Sarah began to see that software and data could be used to better understand people’s behavior in a very different way, but ultimately lead her to ask the question, “How do we use data to predict organizational behavior?”. So how did she find her way from the public sector into the private, VC world? Well as she explains, “It’s actually kind of the same thread. I’ve always been trying to answer this question, how does this small, covert organization disrupt an incumbent. How do we predict their behavior with data?”

At the heart of what Sarah has dedicated her time to professionally has been this idea of data. But data is a bit of a social buzzword these days, especially in the world of technology. So how do we rightly understand what data is? As Sarah notes, “I think what we need to remind ourselves always, is that data is a representation of the world. Data is a representation of human behavior.” Building on this she notes that “It really hinges much more on privacy, and around the ways in which data can be used as a weapon to really oppress, again, those underrepresented populations. I see far fewer efforts to use data as a tool of enablement, or use data to really highlight those gnarly elephants in the room that we’re not talking about.” But what about bias and its role in data? It’s hard to talk about data without talking about the role bias plays in procuring the data. So how does bias impact the data being gathered? “In a sense, there is no such thing as bias in our data, there is only bias in society, bias in individuals, and that gets reflected in our data sources. I think what worries me sometimes is that we see these approaches, we see these conversations around removing bias from our datasets. If we remove the bias from our datasets, then we have no way of actually identifying that bias in society,” Sarah notes.

Another topic of discussion is the idea of diversity and its importance in the process of data gathering and safeguarding bias, but also its inclusion for building a sustainable, growing team. As Sarah notes, “Diverse teams produce better analysis…When you have different types of perspectives you can imagine different edge cases, you can imagine different outcomes, you can imagine different interpretations, and so in fact, diverse teams are just better. Especially in the world of data science.” But she adds, “I think another problematic trend in Silicon Valley is this tendency to want to hire for diversity as an end in itself.” So while diversity is crucial to maintaining integrity, it cannot be an end in itself. In a very real way, diversity is something that Sarah has herself seen play out in her role as a woman leading at an emerging VC firm. As she shares about her experience, she notes, “I won’t pretend that it’s easy. I think in fact as a female in venture capital I do need to hustle harder. You just don’t have the same social structures that your male colleagues may have, but there’s more opportunity to grow in a sense.”

Bringing the discussion to a close the two discuss the future of AI, what these talking points mean for the years ahead and where time and attention will be given across the board. For Sarah, the focus is clear. “One question that I like to ask myself often is, what is something I believe that other people don’t necessarily believe. Like, what are the opinions I hold that are relatively contrarian? And frankly, if there was something I was very excited about that seemed rather science fictiony, or perhaps not even necessarily overhyped, but just weird, it would be this area of augmented intelligence. So if I unpack that and that about what precedes augmented intelligence, I think it is a more sophisticated understanding of what humans are good, and what machines are good at, and how we develop the right interfaces between humans and machines.” “There are a lot of things that we do today that we’re just not super well suited for, I mean think about certain elements of memory, certain routine tasks that we do. Like, how can we fill in those gaps in the future.”

Outbound Links & Resources Mentioned

Masters of Data Podcast Episode

Learn more about Sarah:

Follow Sarah on Twitter @sarahcat21

Connect with Sarah on LinkedIn:

Learn more about Amplify Partners:

https://amplifypartners.com/about/

Follow Amplify on Twitter @AmplifyPartners

Follow Amplify on LinkedIn:

Takeaways

  • Statistical approaches can be used to get better insights into organizational dynamics and why terrorist groups organize themselves in the way that they do.
  • The Center for Advanced Defense Studies focuses on questions like “How can we use software, how can we use statistics, to understand an insurgent group, or understand a terrorist or criminal group of some other nature.”
  • People go out and build algorithms, they collect data, they make these applications, but they’re not really thinking about the human of the process.
  • The hard thing about studying human behavior is maintaining objectivity. You’re always inclined to think about what you would do in a certain position, or how you would act. It’s just instinct, it’s empathy.
  • Data is a representation of the world. Data is a representation of human behavior. So, in a sense, there is no such thing as bias in our data, there is only bias in society, bias in individuals, and that gets reflected in our data sources.
  • Data can be used as a weapon to really oppress underrepresented populations rather than as a tool of enablement.
  • Diverse teams produce better analysis. When you have different types of perspectives you can imagine different edge cases, you can imagine different outcomes, you can imagine different interpretations, and so in fact, diverse teams are just better. Especially in the world of data science.
  • It’s not easy being a woman in VC since you just don’t have the same social structures that your male colleagues may have, but there’s more opportunity to grow.
  • Just knowing that bias exists, does not absolve you of anything. The next step is thinking about how we can leverage the set of tools we have to change things.
  • The goal has to be to get rid of that bias in society, or at least to mitigate it in some way.
  • There’s this gap between big data and AI. In making a transition from big data to AI, we almost treat AI as a silver bullet, and we kind of forgot the fact that data preparation, that analysis, that engineering practices, like these problems need to be solved.
  • A lot of the tools that data scientists are using today, they were designed for individual contributors not for teams.
  • There is a more sophisticated understanding of what humans are good at, and what machines are good at, and how we develop the right interfaces between humans and machines.
  • There are a lot of things that we do today that we’re just not super well suited for, I mean think about certain elements of memory, certain routine tasks that we do.

Newtonian Nuggets

Thoughts on what's going on in technology, data, analytics, culture and other nerdy topics

Ben Newton

Written by

Proud Father, Avid Reader, Musician, Host of the Masters of Data Podcast, Product Evangelist @Sumologic

Newtonian Nuggets

Thoughts on what's going on in technology, data, analytics, culture and other nerdy topics