What is a data scientist anyway?

Maybe you’ve heard the term data science here or there in the news, or mixed into conversations about the job market. Whenever I mention I’m studying data science, the response is usually “Good for you! Don’t they say that’s one of the fastest growing fields these days?”

But what is a data scientist? And who is the “they” reporting skyrocketing growth?

Let’s begin by what is a “science”? According to the Oxford English Dictionary:

“the intellectual and practical activity encompassing the systematic study of the structure and behavior of the physical and natural world through observation and experiment.”

Science is the systematic study of some aspect of the world through observation — in the data science case, these observations are the data that is collected. Data science has methods and involves experimentation to derive results, just like any other science. In fact, it’s is often used in combination with other scientific disciplines because of its power to use statistics on large and complex data sets.

Becoming a scientist requires a lot of training — so what sort of training or knowledge is required for a data scientist? Data science spans a wide range of skills, summarized in the image below, encompassing capacities in statistics, coding, communication and business problem solving.

Source: https://blog.zhaw.ch/datascience/the-data-science-skill-set/

Another useful distinction if what a data science is not. While there may not be a canonical definition, a data scientist is different from both software engineer and a data analyst.

One breakdown is as follows:

Software Engineer: creates systems to best captures and store data, making it easy to access for other to use

Data Analyst: use data to diagnose business problems or convey data-drive narratives to solve business problems

Data Scientist: use large amounts of data to create models that predict future outcomes

Obviously, there can be a lot of overlap, but these roles are different and require some level of differentiated expertise.

And, finally, on the topic of data science as one of the fastest growing jobs, “data science” comes in second only to “Machine learning engineer” according to a LinkedIn 2017 U.S. Emerging Jobs Report. Of course the #1 job also involves data science, a possible hybrid between the Data Scientist and Software Engineer categories I broke out above. As the article states, “comprehensive sets of skills that cover multiple disciplines are seemingly in higher demand”, so it could be part of the trend toward jobs that require a broad range of skills, as well as technical ones.

Like any good data scientist I was curious where Linked got the data for study, and, as stated in their “Methodology” section, the study is based on user-provided data, so the titles are not necessarily strictly defined. In any case, technical skills and those of data scientists are great skills to have now and in coming years.

Ok, so it’s a great day to be a data scientist — but aren’t those the people behind the annoyingly accurate ads of what I just purchased on Amazon and what’s under fire for bias when an algorithm is used to help determine prison bail? These are just a taste of crucial ethical considerations that I hope to highlight in this blog, along with results from data science projects I complete on my path to becoming a data scientist.

Stay tuned.