3 reasons why we don’t need data scientists and why we do

Salvatore Cagliari
Technology Hits
Published in
8 min readDec 30, 2020

As a Business Intelligence consultant, I consider myself capable of looking at data and gain knowledge out of it. This way, I can think of myself as something like a data scavenger.

So, when I can gain knowledge out of data, why do I need a data scientist? Let’s take a closer look at what we can and cannot do without a Data Scientist.

Photo by Myriam Jessier on Unsplash

First, we need to know what a Data Scientist is.

Here two complete definitions:

The short definition is:

A data scientist is a specialist capable of using advanced mathematic and statistic methods to analyse and find hidden data sets patterns.
These methods can be used to build models to try to predict the future as well.

OK, what can I do by myself?

1. I can read my data to understand the past with the help of a time axis. I can see how the data changes over time and try to extract information out of such a report:

Chart created by the author — Sample data from Microsoft

The table on the left shows me the Sales Amount compared to the Sales Amount of the previous year only. The Chart on the right gives me a little more information with the split of the sales by Brand.

These simple charts help me to see how my business has changed over time.

But I need more information to find the factors that can help me change something to make my business more successful.

2. I can find patterns in my data through further analysis of the data.

In the example above, after years with excellent sales, the sales go down for a few years. Then one year with high sales follows. This effect can be because of Marketing campaigns, which significantly impacted my sales or other factors, which I may know, or may not know.

I may want to know how many customers buy what kind of products and where. Such an analysis is an easy task and can be fulfilled with little effort.
One example is to use a Scatter chart to analyse the sales data to get the required insight:

Chart created by the author — Sample data from Microsoft

Here I can see that Europe is the region with development potential, as it has the least amount of sales compared to other geographical areas. Further on, I see that the Brand “Fabrikam” has the most significant gap between the three Regions, Asia, Europe and North America.

Another important fact is that I have high sales with Fabrikam with a small number of customers (size of the bubbles is related to customers' count).
The opposite applies to Southridge Video and Tailspin Toys, which have low sales with many customers.

I may need a new Measure to calculate the Margin, to get a better picture of my business’s success.

This knowledge is something that I can use to start a marketing campaign if needed.

Another example is the following chart, which shows CO2 emissions between 1920 and 1955. Here you can see a drop of CO2 emissions after WWII:

Chart created by the author — Data from Our World in Data

I know the historical background, and I’m able to explain this drop.

But when I don’t have such background information, I will struggle to explain this drop. Therefore I can come to the wrong conclusion or will not be able to explain this drop at all.

If I cannot explain such an effect, I need to dig into the available data or perform historical analysis to find possible causes for this drop in emissions.

3. I can use A.I. and Machine Learning tools to get more out of my data like a Decomposition tree or automated components available in different BI tools.

Here some examples for such tools:

I can use Microsoft Power BI and the build-in Key Influencer tool to gain more knowledge and insights in my data:

Chart created by the author — Sample data from Microsoft

Now I know, that the Categories Audio, Games and Toys and Music, Movies and Audio Books have the least sales amount by a large margin. But why?

I can use the chart in the Segments page to find more information.

The process to find more information changes with the used tool. But now I have a direction for further analysis to improve my business.

With data from Our World in Data, I build the next example:

Microsoft Power BI offers an automated analysis of the data, called Quick Insights.

The following chart, from Quick Insights, shows the amount of greenhouse gases to each other.

Chart created by the author — Data from Our World in Data

But I need to know more about the effect of each greenhouse gas. When I consider that Methane is about 28 time as potent as a greenhouse gas as CO2, I should be concerned about this chart.

The following chart shows the GDP per capita and per continent:

Chart created by the author — Data from Our World in Data

The problem with these two charts is that there is no time relation. So it’s less useful, as it’s not visible when this “Insight” applies and how it changed over time.

Chart created by the author — Data from Our World in Data

As you can see, it means that with a little luck, I can get useful knowledge out of these tools.

But, with bad luck, you will get no useful or misleading results.

For example, mixing data from countries and continents can cause confusion or wrong assumption if you don’t look carefully at the result:

On the Spurious correlations site, you can find much more misleading correlations that are funny but show how dangerous it is, to use a machine to find data correlations.

In any case, you need to put a large amount of work and knowledge to find the right information in your data to gain useful information.

Can a Data Scientist help me with these problems?

Here are three reasons why the answer is yes.

1. I can’t find more patterns and relations which are not obvious

With training in methodical and systematic approaches, Data Scientists analyses the data at their disposal and can find new ways to find patterns.

Advanced mathematical and statistical methods support their approach even more.

Data Scientists uses appropriate programming languages like R or Python to perform their analysis and to search pattern or, possibly more critical, outliers in the data.

Also, Data scientists know ho to find useful relations with the available data set.

2. I can’t understand the tools

When I use the tools shown above, I can try to interpret the result. But sometimes, I struggle to understand how the tools calculated the results.

This means that I cannot fully trust the result until I have checked it in the data. This possible mistrust will cost me time and effort, which I want to avoid using these tools.

Data scientist are trained with different methods to understand what’s going on or build useful results with their skills.
And they will be able to explain the result to me, improving my confidence and understanding of the report.

3. I may not know how to enrich data sets with external data sets and functionalities

Data Scientists look out-of-the-box and use other data sets to enhance existing data sets with external data sets to gain even more information and knowledge.

One example is to use national or international statistical data about populations and demographics in my countries and regions to add a new level of data to my existing data set.

They know how to use Cloud services to use the latest technologies to perform advanced analytics and predictive analytics on my data.

Combining external data sets and Cloud services can lead to massive gains in new knowledge and understanding of existing data sets.

Conclusion:

I’m able to do a lot of analytics on my data. I know my data and can find patterns and relations in my data. I can use tools to find non-obvious data-patterns and use them as a starting point for further analysis.

I know Microsoft Power BI very well and know how I can use it to maximise my analytics and reporting power. Several other tools, like Tableau, Qlick and others, can produce useful reports and help analyse data sets.

I know where I can get external data sets to enrich my data. But I have to make sure that the data is clean and directly related to my existing data.
If this is not the case, I must use traditional techniques to clean and match the data. A Data Scientist can use more sophisticated approaches to reach the same goal.

But when it comes to advanced mathematical, statistical or predictive analytics, I need a Data Scientist colleague's help.

In the end, we work hand in hand to create the best possible results for our customers to provide them with useful information.

Disclaimer:

I’m an employed consultant, and I work for an independent consulting firm. I use Microsoft Power BI as my primary reporting tool for my customer and my studies. I do not get any incentive when I use or recommend Power BI from Microsoft.

The charts' data are from the Contoso Demo data set from Microsoft and a combination of different data sets from Our World in Data.

You can access the reports used in this article with the following links:

· Development of Humanity with data from Our World in Data

· ContosoDW report from the Microsoft Contoso BI Demo Dataset for Retail Industry dataset (Here the source in GitHub: https://github.com/microsoft/sql-server-samples/tree/master/samples/databases/contoso-data-warehouse)

--

--

Salvatore Cagliari
Technology Hits

Nerd passionate about technology, and space science and interested in many other things. On the Business side, an expert in Data Analytics and Power BI.