There is no shortage of articles attempting to lay out a step-by-step process of how to become a data scientist. “It’s easy! Are you a recent graduate? Do this… Are you changing careers? Do that… And make sure you’re focusing on the top skills: coding, statistics, machine learning, storytelling, databases, big data… Need resources? Check out Andrew Ng’s Coursera ML course, …”. Although these are important things to consider once you have made up your mind to pursue a career in data science, I hope to answer the question that should come before all of this. It’s the question that should be on every aspiring data scientist’s mind: “should I become a data scientist?” This question addresses the why before you try to answer the how. What is it about the field that draws you in and will keep you in it and excited for years to come?
In order to answer this question, it’s important to understand how we got here and where we are headed. Because by having a full picture of the data science landscape, you can determine whether data science makes sense for you.
Where it all started…
Before the convergence of computer science, data technology, visualization, mathematics, and statistics into what we call data science today, these fields existed in siloes — independently laying the groundwork for the tools and products we are now able to develop, things like: Oculus, Google Home, Amazon Alexa, self-driving cars, recommendation engines, etc.
The foundational ideas have been around for decades… early scientists dating back to the pre-1800s, coming from wide range of backgrounds, worked on developing our first computers, calculus, probability theory, and algorithms like: CNNs, reinforcement learning, least squares regression. With the explosion in data and computational power, we are able to resurrect these decade old ideas and apply them to real-world problems.
In 2009 and 2012, articles were published by McKinsey and the Harvard Business Review, hyping up the role of the data scientist, showing how they were revolutionizing the way businesses are operating and how they would be critical to future business success. They not only saw the advantage of a data-driven approach, but also the importance of utilizing predictive analytics into the future in order to remain competitive and relevant. Around the same time in 2011, Andrew Ng came out with a free online course on machine learning, and the curse of AI FOMO (fear of missing out) kicked in.
Where we are now…
Companies began the search for highly skilled individuals to help them collect, store, visualize and make sense of all their data. “You want the title and the high pay? You got it! Just please come and come quick.” With very little knowledge of what they were looking for, job postings went up.
If you searched ZipRecuiter today, you’d find over 190k open data science positions currently open, each one looking for their own data unicorn. Thus, in an effort to get talent in the door, the definition of what it meant to be a data scientist soon widened with definitions varying from company to company and person to person.
On the other hand, candidates saw a great opportunity: a career with high pay, high demand, and the promise of job security and glory. Everyone rushed to develop all the right skills with one goal in mind: to hold the “sexist job of the 21st century”.
We have the demand and we have the supply, so what’s the problem? Well, the problem isn’t a shortage of programs to support that demand and capitalize on the hype. It feels like every day there are new courses being developed to satisfy the cravings from aspiring data scientist to break into the field: master’s programs, boot camps and online courses. It’s an arms race to make the right courses with the promise of a Machine Learning job at the end of it. “No PhD? No problem. Just three to six months and a small investment of ~10–15k and you’ll be guaranteed a well-paid job upon graduation.” (wink)
These programs are designed to be a one-stop-shop for everything data science: you learn the programming, the visualization, the modeling — it’s all there. What you soon discover is that many (surely, not all) of the business problems being faced can be solved using similar approaches, so if you’re looking to apply some algorithm, chances are there’s a library that already exists to help you do just that. Simple right?
If you’ve been paying attention so far, you will have picked up on a few important things so far:
1. By getting ahead of themselves, companies are hiring data scientist before they have even started collecting the right data (i.e. they are suffering from the Cold Start Problem of AI), meaning you will need to be involved in every step of the data pipeline including data collection, storage, and visualization before you can get to the modeling.
2. Rushing to get a job in data science (going through one of the above-mentioned methods) means you will be competing against hundreds of thousands of others in the same exact position. Expect that they will have similar projects to yours and similar experience. To get yourself noticed, you will need to find a way to differentiate yourself: showing your creativity and grittiness.
3. Chances are you won’t be developing algorithms from scratch. Unless you have a lot of extra time on your hands, you’ll most likely on the existing and well-trusted libraries. Why compete against a group of PhDs that helped develop the library and risk putting something less than optimal into production unless you had to develop something specific to your use-case.
What’s to come…
I believe, a few major things will change from here.
First, the data scientist role will become well-defined in terms of the skills needed to do the job and the expected impact on businesses. Second, automation of AI models will come. Don’t believe me? It’s already happening with products like Google AutoML and software packages like TPot. What seemed like the most technically challenging and appealing part of the job will be automated. As they say now: “No ML expertise? No problem!”
The implications of the trajectory on which we are headed is that:
- No longer will data engineers or data analysts be mistakenly given the title or the pay of a “real” data scientist, which will likely dwindle down the number of data scientists in the market.
- Human insight and oversight will become increasingly more important. Not to say that technical expertise will become less important, but the importance of connecting the analytics to the business and knowing how software interacts with people will increase (i.e the human side of data science).
- Lastly, since the Machine Learning part of the job behaves most consistently, it will be the first part to be automated (and we see that happening already). This means, that most of the value that data scientists will bring will involve the unification of messy data, preparation for modeling, and evaluation.
Therefore, when trying to answer that question for yourself “should you become a data scientist?”, imagine yourself through the entire hype cycle. Could you see yourself growing in fast growing and evolving field that will challenge you in every way? Can you embrace the change and adapt yourself just as fast? If so, then this is the career for you…welcome aboard.