by Kaitlyn Henry
My day one investment thesis when I started working in VC was about finding better tools for data scientists. Before shipping up to Boston to join OpenView, I worked on the finance & BI side of a machine learning team at Amazon. It was me and a team of 50+ engineers, data scientists, researchers and economists all working together in a crazy, creative, collaborative fervor on just one core model (the often criticized, and more often misunderstood featured merchant algorithm, for those who care to know).
But even with the luxury of a massive team and all of Amazon’s resources, I found that managing the end-to-end lifecycle for a machine learning project was still pretty difficult. There’s a lot of cross-functional stakeholders, and instead of just managing a code base like you do in software development, everyone has to juggle changes in the data, the model and the application code all at once. Managing such a dynamic project with all those functional silos took a ton of time and ended in a lot of miscommunication. As a result, projects and experiments moved slowly or even failed to get into production entirely.
There’s certainly no shortage of startups and open source projects out there trying to make machine learning projects easier. Fast forward to my first year in VC and I’ve probably already seen a hundred “tools for data scientists” pitches. But the more I talked to fledgling data science teams or CEOs at the latest ML-powered startup, the clearer it became that bringing a machine learning application to life is still really, really hard.
So where’s the disconnect? Why haven’t all these data science tools made machine learning any easier?
Designing for the end user in data science
When most of the founders I’ve met in this space set out to build their data science tool, they look to the product led growth strategy championed by some of the world’s best open source software development tools: design for the individual end user and make something that can be adopted bottoms up. Founders conjure up the image of a lone data scientist hacking away at a side project and ask themselves, “How do I make this person’s life easier?” After all, that’s how you build viral products in the end user era, right? Build something an individual wants to use in their personal life, and eventually they’ll pay to use it at work (think the same adoption model as tools like Slack, Trello, Zoom or Calendly).
But here’s the issue — data science is inherently different than software development. An ML application can’t be managed by a single stakeholder like it can in software development. Data scientists can’t build something by themselves. They need a lot of help to make things happen.
ThoughtWorks recently put out a piece on continuous delivery for machine learning that does a great job of explaining the different stakeholders in machine learning and how it varies from just writing code. In short, software development has one major axis of change (the code), while machine learning has three — the data, the model and the application code. In software development, developers can bring something to life largely on their own. In data science, it’s a group effort. And it’s this “group” mentality that a lot of teams building data science tools overlook.
It’s not just about the data scientist
Here’s the thing. While the end-to-end machine learning lifecycle can be made a tiny bit easier by building tools just for data scientists, they’re not the only end user founders should be thinking about. Building for the end user in data science means building for more than one person, and there’s a lot of work left to do in improving the way data scientists, data engineers and developers work together.
Of course, there are some enterprise plays out there taking the top-down approach. They’re forcing major process overhauls and resource-heavy implementation cycles that don’t make sense for smaller, scrappier teams. As we’ve seen in software development or everyday tools like Slack and Zoom, some of the best products are adopted bottoms up. I don’t think data science is any exception.
So to all the entrepreneurs out there building tools that data scientists love, I challenge you to think differently about the end user. Instead of picturing the lone data scientist hacking away at a model, think about the entire data team. Think about how they work together, how they annoy each other and how they wish they could communicate. By building for the entire team, you’ll break down silos, help ML teams get to better outcomes faster, and play a role in unlocking the power of machine learning for teams of all shapes and sizes.