You’re Telling Me This Is Data Science?
Misconceptions About Working In Data
--
Data work is a stable, respectable professional field, with pretty decent pay in exchange for staring at a glowing box much of each day. But before you dive in, be realistic about what it isn’t.
Things you might think, and the reality…
“I’ll do a bootcamp and some Kaggle competitions, and then I’ll be a data scientist!”
No one will hire you. Those two-week training sprints might make you a script-kiddy who can use the hottest buzzwords or call an autoML library in whatever cloud tech is hot this month. But will you know how to wrangle the mess of real data you’re given — in some outdated binary format? Or the spaghetti code your predecessor hacked together? Or the business needs of your grumpy VP of sales? The hype of big data is over, and companies want experience more than warm bodies.
“Data science and analysis are totally different, and data science is the top dog, so that’s what I’ll be!”
These terms often mean very little. Pay and daily tasks might be the same, or worlds apart. A data “scientist” in one company may run SQL and do ad-hoc reporting nine days out of ten, while an “analyst” is creating a random forest model. And both of them are scrubbing data half the time. Look for specific duties and the needs of the company, not titles.
Also, data science unicorns are disappearing, as is the very term “data scientist.” Companies realized that a random PhD isn’t as helpful as a BS in comp-sci, and are hiring ML engineers or data engineers instead.
“I will work for Google and make six figures!”
Yeah, maybe. But most doctors don’t work at the Mayo Clinic, and most graphic designers aren’t at Pentragram. There are tons of ordinary companies who have ordinary, work-a-day data needs — not to mention all the non-profits and government agencies. These places won’t pay as much and aren’t prestigious in the least. But they may be the place that offers you a job.
“I’ll be at the cutting edge of AI!”
Most crunchy data work is cleaning and wrangling to prepare it for actual “serious” work. Most of that serious work is exploratory and simple summary statistics can address many problems. Advanced AI is seldom needed (does your random company really have unstructured images it needs to categorize?), and more basic ML methods like regression or a decision tree will be perfectly adequate.
“My amazing math and tech skills are all I need to succeed!”
Sorry, but you will need to deal with other people. You will have to interpret business needs, and their context. You will have to work with a team of co-workers, bosses and other stakeholders — who are probably not autistic computer geeks. You will need to present your results in human-readable form in writing, presentations, and visualizations.
“Data speak for themselves!”
They do not. Data are collected in a specific way, for a specific purpose. People want to bend and warp those data to suit a conclusion they’ve already made. Every analysis you perform will lead to a result someone won’t care for, or will wish you weren’t even doing. And you had better frame those results carefully, for the particular audience you have, or it will be ignored. Only in rare circumstance are you charged with merely optimizing a narrow and well-defined technical problem without any political ramifications.
In the end, data work might be for you. But it might not; and just because it’s hot doesn’t mean it’s the only choice.