Why Calling Yourself a Data Scientist is Like Calling Yourself a Doctor

One of the most loaded terms I come across in my professional life is the title I most often use to describe myself: Data Scientist. What is a data scientist? I’m not really sure. I know what I do, I know what some other self-described data scientists do, and I know what my wife tells people I do — none of which gives me a consistent description of a data scientist.

If you look across the internet, job boards, and think pieces describing the hottest jobs of the future, you invariably come across descriptions of data scientists. Depending on the site, these descriptions will conflict, confuse, and often make no sense whatsoever. They’re littered with terms like data mining, machine learning, R, python, and jobs are consistently asking for advanced degrees that qualify an applicant as a unicorn, Rockstar, or most terrifyingly a data ninja who can do everything from cloud architecture to deep learning. But for me to stroll up to a person on the street and tell them I’m a data scientist, I’m giving this unlucky person as much information as if I were to say, “Hello, I’m a doctor.”

At first glance, saying you’re a doctor sounds pretty descriptive — white coat, stethoscope, cold yet soft hands. But consider the following interaction:

Two people meet at a party and person A asks Person B, “What do you do for a living?”

Person B responds, “I’m a doctor.”

Person A, seeing their opportunity for some free medical treatment, exclaims, “Great! I’ve been having the most annoying pain in my knee,”

Person B responds, “Probably can’t help you too much — I’m a cardiologist.”

Think about the doctors you’ve encountered in your lifetime. There are research doctors, internists, urologists — and those are just my college roommates. All of those doctors do different things on a day-to-day basis and if they were to try to do the others’ jobs, things may not go so well.

To me, the same is true with data scientists. There is so much information and so many skills that fall under the umbrella of data science, it would be extremely hard for a single person to do everything on a day to day basis. Also, since there is so much someone working in data science will invariably encounter, people tend to gravitate to topics and issues that interest them. This causes most data scientists I’ve met to do what doctors do: specialize. There are those few Rockstar-ninja-unicorns, in both medicine and data science, that can do it all. But it would be equally untenable for a hospital to only hire all-purpose doctors as it would for an organization to only employ all-purpose data scientists.

But what should people be able to expect from data scientists? In the example above, the doctor isn’t useless. Odds are, being the friendly doctor that Person B is known to be, she would probably say something to the effect of, “I know what the knee does, I know what it’s connected to, and I could probably tell by looking at it if there’s some serious issue with how it’s functioning. But odds are I’ll need to refer you to a specialist.” A data scientist should be able to provide a similar experience. Data scientists should understand and have a base knowledge of the underlying components of the discipline, but to expect a single person to have a deep understanding of everything is too much to hope for.

In the real world, the most important thing a data scientist can do is know when to refer you to a specialist.