The Data Scientist profession: hyped, hard and inevitably specialized?

In the era of Big Data and Analytics, the “new” role of Data Scientist is arguably the most hyped. It’s been joked that “a data scientist is a statistician who lives in San Francisco”, but that’s more fun than accurate, and I think the following definition is more instructive of what we’re talking about:

A data scientist is someone who is better at statistics than any software engineer and better at software engineering than any statistician.”

Inevitably, more than a few Venn diagrams have also been proposed, mostly looking something like this:

I intentionally put “new” in quotation marks earlier, because data science isn’t new aside for the name. Statisticians, business analysts and operations researchers have been doing this type of work for decades. What’s changed in the last 5–10 years is that more data is now available, and the tools and languages for how to access, transform and analyze that data are changing. A new generation of data scientists have embraced open source tools — such as Python and R — and these professionals are better programmers than the traditional statisticians, analysts or operations researchers (whether they’ve sacrificed some math or domain knowledge in the process I’ll leave for debate or a later post).

With that as background, here’s the point I want to make: whenever you have a profession defined through the intersection of multiple fields, you’ll end up with a challenging “jack of all trades” role. Rewarding, yes. Easy, no. Among all the confusion, I think we’re seeing a similar evolution for Data Scientists as we’ve seen for Product Managers for example.

  • Lots of hype? Check.
  • Talent scarcity? Check.
  • T-shaped skillsets? Check.

If you’re hiring for data science roles, it’s currently nearly impossible to find individuals that are great at all aspects of data science. Yes, you can try and find that unique individual who is a “10x coder”, got a PhD in Applied Math, has spent 10 years in your industry and have TED-talk level communication skills. Good luck. More likely, you’ll need to be very thoughtful about how you recruit and compose data science teams of t-shaped individuals.

One way to to look at the work at hand is through a Value Chain, and very simplified, it looks something like this:

Simplified value chain for Data Science

The deep skills required maps roughly to this diagram, i.e. to properly staff a project or a data science team, you’ll need a combination of:

  • data wizards who can access and clean data,
  • statisticians/operations researchers who can build or train analytical models,
  • domain experts/business analysts who can communicate strategic business recommendations, and/or
  • software architects who can embed models as part of operational systems.

For a small project, you might have a team member of each type, and maybe some can even serve dual roles. For a large project, or to build organizational knowledge and resilience, you’ll need to double up (or more) on each.

Naturally, having the right tools and culture of collaboration is essential. When things go wrong, it’s often due to gaps between these roles and tasks. For example, data wasn’t clean enough for the stringent demands of analytical models, or the models weren’t robust enough to handle changing conditions in operational processes. When things go right, it’s usually due to a team that have a shared goal, partially overlapping skills, and integrated tools that allows them to work on the same data and see the same results.

Whether you’re looking to hire data scientists, or aspire to be one, ignore the popular notion of data scientists as unicorns who can do it all. As data science continues to mature, and projects grow complex and operational, it will become a team sport of t-shaped individuals.