Fake Data Scientists? Or Just a Field in Its Infancy

On Quora, someone recently asked “Why are there so many fake data scientists and machine learning engineers?” While the hype surrounding data science is undeniable, is it really causing an influx of fakes, or are we just witnessing a field trying to find its identity? Let’s take an honest look at where we are in Data Science, and what truly defines a “real” Data Scientist.

To say there are a large number of fake Data Scientists is to assume you know what a real version of this position looks like. If you were to post your question as how do we know a real data scientist when see one, you would be met with the same amount of uncertainty.

For example, which one of these is the “most real” Data Scientist?

  • A PhD in artificial intelligence who has never worked on enterprise, production software;
  • A physicist with deep knowledge in physics simulations, but no experience in data-driven modeling;
  • A statistician who is an expert in sampling, interpretation and estimation, but who has only worked on parametric models;
  • A neuroscientist with vast experience in designing research experiments, who has never written a line of code;
  • A graduate with a Masters in Machine Learning who has never validated a model in front of real-world users;
  • A software developer who understands best practices, who has never tried testing a piece of software with nondeterministic output;
  • A newly minted Data Science graduate whose “experience” comes from courses and workshops;
  • A mathematician who focuses on optimization theory but cannot explain its relevance to stakeholders with no technical background.

Who would you choose? Nobody can rightfully say who in this or any other list would make the best Data Scientist, or any other kind of machine learning practitioner. Data Science is in its infancy, becoming the field it needs to be to support the next generation of products. Product development is much more complex than a set of basic theories that only make sense in a vacuum. More challenging than some academic approach to designing a learning algorithm that never sees the light of day. More intense than a workshop analysis of how to create a model using clean datasets.

This is why there is no ONE person or ONE background that can be defined as “real.” Like any successful system in nature, Data Science benefits from variety, where different backgrounds and opinions weigh-in on how to solve problems. The only metric to “real” is the authentic passion one needs to bring to the process of solving problems, and getting this new kind of product in front of real people. Data Science isn’t where mathematical elegance or upfront academic design lives. It’s a messy world, that is far more complex than anything a degree or specific background could possible prepare you for. The only “fake” that lives here are those who chase salary above learning, or who think “smart” is defined by your ability to toss naive formulae on a whiteboard nobody uses. Those individuals don’t last long, getting filtered away by the natural process of keeping our efforts accountable to the only thing that matters; building a product people want to use.

Data Science is coming into its own, as we are only now laying the foundations to this field. I can tell you, if you come into Data Science thinking it will work like academic machine learning you are going to fail. If you think building machine learning products looks the exact same as mainstream Agile you are going to fail. If you think statistical validation is the true signpost to getting data working inside a preditive application you are going to fail. And if you think math is more important than high-level concepts everyone can understand you are going to fail.

Getting real-world ROI from our efforts comes from variety and abstraction. It takes a blend of many skills and many backgrounds to arrive at a tangible piece of software that improves the experience of its users. No company has Data Science “figured out”, or can claim they own the innovation to tomorrow’s game-changing products. If you are passionate about using data and working with teams of people to create something that is changing the very nature of how we use technogy, than join us. It’s complex, messy, full of mistakes, and a far-cry from the idealized environments behind ivory towers. But it’s also damn worth it. That’s what’s real.