Don’t be a data science generalist, yet
Pick a specialization and double down
Every now and then I’ll see a “Data Scientist roadmap” that is jam-packed with what seems like every data tool and ML concept under the sun. Though these are useful to get a high-level view of the ecosystem, I think they are often counter productive. It can lead beginners to think they have to learn everything before they will be hire-able — every algorithm, every tool, every machine learning and data science library.
Some trends in the job market reinforce this notion. Job titles are inconsistent and blurry. What one company calls a Data Scientist, others will call a Data Analyst. Do enough interviews and you will get asked everything — from leet code questions, to case studies, to getting grilled on statistics.
The truth is, to be an effective employee, you don’t need to know everything.
Yes, some job descriptions make it seem like you do. But job descriptions are wish lists. You do not need to satisfy every prerequisite. What you do need is a solid grasp of the fundamentals and a growth mindset.
The learning won’t stop when you get a job. In fact, that’s where most of your learning will happen. You just need to convince your interviewers that you are capable of learning the things you are missing.
When you’re just starting out, don’t go too broad. Pick a specialization and focus on it. No job is perfectly categorized and the industry is constantly evolving. But there are a few core archetypes to consider: Data Analyst, Data Scientist, Machine Learning Engineer, and Data Engineer. Your first goal should be determining which of these archetypes fits you the best (this short quiz might help).
To be fair, this is hard to truly know without actual experience. You might land your first job as a Data Scientist and only then learn that you prefer the work of a Data Engineer. That’s fine. That’s natural. You will learn your preferences over time, and your career path will change accordingly. But your initial job search will be much easier to manage if you can narrow the search space, despite your own uncertainty.
You can make up for a lack of experience by getting your hands dirty and building things. Build end to end project that touch all aspects of the data pipeline — collect and clean some data, train a model, deploy it. Pay attention to which parts of the pipeline you enjoy, which parts you do not. Dig deeper into the things you enjoy.
Once you have an idea of the kind of work you enjoy, double down. Again, you don’t have to learn everything. Realistically, pick 2–3 technologies. Collect a bunch of job descriptions that appeal to you and see what comes up most often in the requirements. Talk to folks in the industry and see what tools they use and what they think is important. Focus on those things.
There are some companies out there that do need a data science generalist — someone who can basically do it all. This is common at startups or on brand new teams at a larger companies. These are typically not good roles for beginners, however.
Don’t be a data science generalist. At least, not yet.