The Purpose of Platforms in Data Science

kevin novak
5 min readApr 12, 2016

--

Much has been written and discussed about the field of data science and its role in industry; what it is specifically, how to become a practitioner, and the relative attractiveness of the field. However, comparatively little has been written about scaling an organization of data scientists and the organizational support necessary to do data science well, at scale.

I’ve spent the past year at Uber working on exactly this question, and almost 5 years working on data science at Uber generally, and I wanted to share some of my learnings on the power of platforms when it comes to data science.

Uber Data’s Early Days

First, some history. Data has always been fundamental to Uber’s story as a business — two of our first ten technical employees were data scientists — and to our product, where we’re building a system that provides effective transportation around the globe that’s so reliable people forget about the complexity of the underlying optimization problem.

Uber’s always been proud of hiring data scientists who’re comfortable straddling both sides of the divide between an engineer and a researcher. In fact, many data scientists regularly write and ship their own features. This, coupled with a highly entrepreneurial culture, gave us the ability to create an embedded organization that scaled quickly; Uber Data Science grew from a team of 2 to a team of 35 in less than 9 months. We were able to create platoons of scientists going deep as individuals on problems ranging from public policy to dynamic pricing to uberPOOL, yet who still benefitted from having a distinct organizational identity and leveraging the collective wisdom of the whole organization.

“Scaling a startup quickly as a leader is essentially an exercise in identifying and iterating on the aspects of your product and company that must evolve to survive.”

Scaling a startup quickly as a leader is an exercise in identifying and iterating on the aspects of your product and company that must evolve to survive while simultaneously holding the line on other aspects that cannot change. Essentially, a constrained optimization problem. Growing a data science org quickly meant growing from recruiting scrappy entrepreneurial data scientists comfortable with ambiguity and eager to install structure, to finding domain experts with the experience to iterate quickly and zero in on the non-obvious areas for improvement. At the same time, we couldn’t compromise on the level of technical rigor we expected (as we felt it was necessary to keep us nimble) nor our requirement that we find people with a visceral passion for our mission and the fearlessness and empathy necessary to engage in hard debate as a team.

In many ways, the decision to form data science platforms is a natural extension of this process. Platforms allow us to appoint owners over critical data science problem domains, give them non-linear leverage, and give us a space to tackle strategic, broad-impact projects.

Introducing the Data Science Platform

When we made the decision to create distinct platform teams a year ago, Uber Data Science was in need of evolution. Our pattern of sharing best ideas via communally owned libraries, wikis, and Jupyter notebooks weren’t scaling and each platoon started possessing “tribal knowledge”. The engineering infrastructure necessary to query data at scale, process experimental data, or build and maintain machine learning models wasn’t built with data scientists in mind; we became our database admin’s worst nightmares. Duplicative work was happening, or worse, we missed opportunities to leverage data because of friction.

The genesis of our platforms came from a simple idea: find the fundamental problems that every Uber data scientist faces, and form cross-functional teams dedicated to solving that issue awesomely. Collective ownership is great for agility and quickly identifying needs within the organization, but there’s real power in creating an organizational role where somebody comes to work every day with the mission of making machine learning at Uber incredible.

The Power of the Platform

Where does this power come from? Multiplicative impact. In a high-intensity, mission-driven workplace, it’s a fair assumption to assume that the team is operating at maximum individual intensity. Ergo, the only way to get more work done next week than last week, short of hiring, is to invest in efficiency. Leadership can achieve wins in efficiency through refining organizational process or mentoring and employee growth, but a platform is an investment in technological efficiency.

“The only way to get more work done next week than last week, short of hiring, is to invest in efficiency … a platform is an investment in technological efficiency.”

Uber operates under the seductive technology principle — no team can be coerced into adopting a technology — and technological efficiency is what makes our platforms seductive. Our machine learning platform has developed tools to make Uber’s features reusable and shareable across teams and is investing in push-button deployment of models to production. Our experimentation platform makes it simple to deploy experiments “over the wire” and collect analytics automatically. Our anomaly detection group makes outage detection accessible and root cause analysis faster. All of these represent investments in the company’s acceleration through platforms, not just its speed.

Yet efficiency, while critical to the success of the company’s platforms and a way of providing immediate value, is only half of the story. Uber’s customer facing programs operate in a mission-driven style, focused on high speed of execution against a mission and delighting their proximate customer. It also naturally follows that their execution style becomes focused on rapid iteration and agility, with an emphasis on what we can do in the next few weeks to move the needle. However, not all data science efforts fit into this paradigm. As most of us know, research is an iterative process involving indeterminate amounts of discovery time, blind alleys, and tangents. Platforms provide a great opportunity for Uber to incubate some of its more strategic projects that have a scope or development cycle that simply wouldn’t be possible in a more mission driven environment.

Where We’re Headed

One year in, Uber’s made great strides in developing three foundational data science platforms — we’ve built a machine learning platform from the ground up, pushed our experimentation platform’s functionality and customer base into new business units, and turned anomaly detection for a passion project into a general service platform open to all of Uber’s engineers.

However, our story is far from complete. We’re working on a metrics platform that’ll scale our business concepts across all our streaming and batch data infrastructure, for example, and the existing platforms continue to grow. On that note, we’re always on the lookout for new additions to the group. If you’re interested in working with us, check out the job openings below; we’d love to have you join us.

Data Science Platform roles:

Other Data Science roles:

  • t.uber.com/DSjobs

--

--

kevin novak

GP @RackhouseVentures. First Head of Data Science @uber. Midwestern data scientist living in the Bay Area.