You’re Not a Data Scientist




Many of my friends, colleagues and contacts have started calling themselves Data Scientists. A number of resumes have crossed my desk recently indicating that we’re minting data scientists faster than expected. I’ve seen this movie before.

The IT biz has historically rebranded job titles based upon what’s trending — today’s Software Architects were once known as Designers or Systems Engineers. Nothing is trending faster and louder than predictive analytics, machine learning, deep learning and AI. So it’s our turn to rebrand data geeks as data scientists.

Now don’t get me wrong — some of these folks are legit Data Scientists but the majority is not. I guess I’m a purist –calling yourself a scientist indicates that you practice science following a scientific method. You create hypotheses, test the hypothesis with experimental results and after proving or disproving the conjecture move on or iterate.

The Mind of a Data Scientist

Data science is an applied science. So as an applied scientist you create things — models, methods, and algorithms that provide practical utility.

These ‘things’ are valuable because they predict future outcomes from relatively few data inputs. In some cases your models are black box enigmas — you might not understand how the prediction is derived — you’ve only shown that the models are accurate.

So in the spirit of maintaining an unadulterated definition of data science I make the following assertions that might indicate you’re not really a data scientist:

  • Expertise with the business intelligence stack doesn’t make you a data scientist. You’ve spent much of your time predicting the past by performing time series analysis of historical data. It’s not data science — you rarely perform experiments, your predictive power is illusory.
  • Programming experience with Hadoop, R, Python, Octave, Matlib and Mathematica are data science tools. Tool skills — alone — don’t give you data science cred.
  • An advanced degree in mathematics, statistics, econometrics doesn’t mean you’ve earned the right to call yourself a data scientist. Hopefully you’ve developed the skills to apply descriptive and predictive techniques while maintaining a strong grasp of the underlying theory. But data science is an applied discipline focusing on specific subject area data — most likely you didn’t receive sufficient real-world experience pursuing your college degree.
  • Evangelizing that big-data, little-data any-data is the future of the predictive enterprise looks relevant on your resume, may get you a few conference speaking gigs and entertains your friends at cocktail parties BUT you’re not a data scientist. You’re a big data groupie.
  • The 8-week course you took on Coursera or the Data Science boot camp you attended no more makes you a data scientist than my recent golf lessons make me a golf pro. I believe in lifelong learning and I’m all for self-improvement but this is self-delusion.
  • You’re a subject matter expert, an Excel wizard capable of creating incredible charts, graphs & pivot tables. Those skills, while valuable, don’t make you a data scientist.
  • You’ve recently acquired a data science platform from SAS, IBM or Microsoft and without prior experience and after reading the manual, watching the 10 intro videos or taking the 5-day training course believe that you can create predictive/ explanatory models of subject matter data by dragging and dropping algorithmic widgets onto a canvas and pressing the ‘LEARN’ button. You’re not a data scientist — in fact — you’re dangerous.

I’ve written this short post in snarky fashion — I apologize if I’ve offended. But I think it’s time we clearly define what a data scientist is and ‘is not’.

I know that I’ve omitted other data science sub disciplines like experiment design, sampling, etc. Maybe we’ll discuss what data science IS next time…