Lost in Translation: Data Industry Professionals
What You Are And What You Are Not (or Maybe).
“Can you believe he only knows Python and R and dares to write Data Scientist on his LinkedIn Profile?”
“What is he supposed to write, then?”
“I don’t know, but I would feel like an imposter calling myself a Data Scientist when my PhD friend rocks algorithms in his sleep.”
To be or not to be a Data Scientist?
Definitions do overlap depending on the definition of the company/industry. Even more so, some job postings confound both in the positions they offer. Truth is, if you’re in a job hunt race, what you write or not on your LinkedIn is mostly for SEO purposes and filters. What matters is how you can back up your claims with work you’ve done, and how eager you are to learn the tools and skills, your curiosity and quick learning abilities on the job.
Whether you are a Data professional, or a manager who seeks to hire one, or just a curious one confused by all the overwhelming, mind-numbing jargon avalanche you’re exposed to, here are some infographics and data I was able to glean from DataCamp to cure your headache.
Let’s start with the jack of all trades.
Satisticians are the traditional label for wizards using statistical methods and theory to collect, analyze, interpret data and evaluate models. Statistics historically emerged as a branch of mathematics applied to scientific, social, industrial and economic problems. Some guys just woke up one day and realized the name was not sexy enough. “Let’s confuse people more and throw a bunch of names around.”
These wild names appeared shortly thereafter, amplified by the fast paced media race for attention.
A Data Analyst typically combines a background in statistics, business and programming to apply statistical and machine learning algorithms to the data from its raw form to one that can enable actionable insights for decision-making.
A Data Scientist job is a Data Analyst+ job, when the data’s volume and velocity call in for a more robust skill set. Tasks here involve dealing with large streams of (often unstructured) data or Big Data, cleaning and questioning the data, serving similar end goals as the previous role of the data analyst. In many occasions, data scientists are thrown a dataset that the management has little knowledge what to do with it, and it is up to them to identify possible uses for the data.
The Data Architect is usually tasked with data management. Think of your data being spread across different sources and you need to bring them all into one location (hence the term “Data warehousing”) while ensuring functionality. They are responsible for designing database systems and solutions according to the business needs.
Data Engineers work hand in hand with the Data Architects in terms of data management, however, the role here further requires software engineering skills to develop, test and maintain these database systems.
Database Administrators make sure the data is organized, safely stored and easily retrieved. They further ensure troubleshooting, backup and recovery.
A Business Analyst is able to explain data in layman’s terms to management and clients. They do not necessarily have a data analytics background.
While you may find that Business analysts and Data analysts positions are are used interchangeably, the difference is that while business analysts are mainly focused on interpreting the data for business decision-making, data analysts like to gather and analyze the data to support decision . Northeastern University’s professor Martin Schedlbauer, explains the difference: “In the simplest terms, data is a means to the end for business analysts, while data is the end for data analysts”.
Typically, it takes anywhere between 1–5 years to hold this position depending on the size of the company, experience, and managerial skills.
If you’re in it for the buck, you’re at the right place.
Money talks for now, but salary reflects how demand is outpacing supply, and it is likely to show a similar trend for the next decade or two until, before new technology skills and needs arise.
In my humble opinion, the tables may turn not in favor of data scientists, but rather in favor of the folks who provide the infrastructure, capacity and velocity needed for the former to carry out their work. The focus today seems to revolve around end products (applied machine learning etc.), but the real roadblock (and opportunity) down the line is in what supports the latter when we hit big technology adoption numbers, for both business and personal use, that call for big or sustained capacity (cloud, servers, computing power etc.)
This post was originally written as a contribution to the Duke Artificial Intelligence Society blog. We are a Duke University — Fuqua Business School affiliated organization that aims to provide space for students and professionals to get together, exchange, discuss and learn about topics in AI and Applied Machine Learning in the 4th Industrial Revolution.