centralized data science is a road to nowhere

Patrick Dougherty
3 min readDec 2, 2022

--

Inspired by “strong opinions, loosely held”, here’s a strong opinion.

Photo by Johannes Plenio on Unsplash

Strong Opinion:

No company should have a “centralized” data science / machine learning team (that is, one that rolls into the CIO / CTO). Data scientists should always be embedded within a business function such as product, marketing, or finance.

Loosely Held:

This one is actually held pretty tight. Maybe there are exceptions where this works out well? I haven’t come across them.

Why?

As a data scientist, you’re constantly weighing two objectives that are directly opposed in most cases:

  1. Business value… decreasing costs or increasing revenues
  2. Technical sophistication… applying interesting and complex techniques to data

In a centralized data science team, there is no north star to guide your decisions along these axes, so you tend to gravitate toward #2. After all, you were hired to “do data science”, and everything you read about online is how other data science teams are using the latest and greatest techniques. How will you get your next job if you don’t have some of that experience on your resume? Delivering business value is celebrated very little by other data scientists (maybe there’s not much to celebrate?) so you naturally deprioritize it.

Here’s a representative example based on a few of my real experiences:
Imagine you’re a data scientist working for a large e-commerce shop. You recently got hired, have access to some of the marketing and orders data, and your boss tells you, “Marketing wants you to build some models to help us improve efficiency of our ad spend — it’s a top priority for us this quarter.”

“Sweet!”, you think to yourself. Just the kind of project I’m interested in. You tell marketing that you’re planning to use a random forest or neural network type model to predict campaign performance based on attributes describing each campaign, including budget amount, ad type, ad platform, ad text, product / product line, etc. Marketing is excited — they think the model will help them improve their campaigns and look smart doing it.

Fast forward 6 weeks… you’ve trained some models, iterated on them to improve accuracy, and they’re doing.… okay. You can predict Google Adwords campaigns with some accuracy, especially on branded keywords. Your best variable is a flag variable for whether or not the campaign uses retargeting. You’ve also noticed that ad campaigns on Twitter are incredibly unpredictable, and no model seems to predict them well.

You share your findings with the marketing team, and they are a little confused. They tell you that they have to stop the retargeting campaigns by the end of the year because of GDPR, and they only run brand awareness campaigns on Twitter since it’s notoriously bad for conversion. They ask, “so what should we do different to be more efficient with our spend?”.

A few things went wrong for our data scientist hero / heroine in this example:

  • Not understanding the ins and outs of marketing well enough
  • Not aligning the specifics of the model (what to train on and what to predict) with a concrete decision that the team is regularly making
  • Not pushing back on management who came up with a data science use case right out of a generic Forbes article

Let’s revisit our potentially opposed objectives… business value and technical sophistication. This data scientist was given carte blanche on the second, and trusted that management understood the first. I’ve personally observed the frequency of this combination and the inevitable negative outcome: once an organization has seen this play out with multiple data scientists on multiple projects, they completely discount the value of data science.

How should a company avoid this scenario? Easy:

Fix the org chart. Only hire data scientists within business functions. Incentivize and reward them for delivery of business value to their business function. Models are the means, not the outcome.

Treat data science as a tool, not a job. Statistical modeling is one of many techniques you can use to inform better business decisions. Hiring data scientists to exclusively do data science is giving them a hammer and sending them searching for nails. They’ll hit something, but for what?

--

--

Patrick Dougherty

Co-Founder and CTO @ Rasgo. Writing about AI agents, the modern data stack, and possibly some dad jokes.