SQL, Python and Tableau For Data Analytics

amoakomensa
7 min readNov 14, 2021

--

This is an approach to tackle a relational database project using sql to clean the data, python to analyse the data and BI tool(tableau) to visualise and highlight primary KPIs and predicted metric(revenue).

Let’s begin by firing up datasets via excel just to look at how the data is. The dataset is Revenue, Marketing and Visitors data.

The tool used to query dataset is mysql, i am using datagrip from Jetbrains.

create database sqlanalysis;
use sqlanalysis;

-- use describe to look at the data type of Revenue Datasets
desc `Revenue Data 2`;
desc `Revenue Data 1`;

-- now lets look at the dataset
select * from `Revenue Data 1`;
select * from `Revenue Data 2`;

The query above creates a database and uses the database. We then look at the datatypes in Revenue datasets and look at the columns and rows in the datasets. Now lets look at the existence of null values in both datasets. — now the first thing to do is to look if there are any empty rows in the dataset

-- now the first thing to do is to look if there are any empty rows in the dataset
-- there are null values in WeekID, Month Number, MonthID, Day Name and Revenue
select
sum(IF(Date is null, 1, 0)) as NullValuesDateColumn,
sum(if(Week_ID is null, 1, 0)) as NullValuesInWeekID,
sum(if(`Month Number` is null, 1, 0)) as NullValuesInMonthNumber,
sum(if(Month_ID is null, 1, 0)) as NullValuesInMonthID,
sum(if(YEAR is null, 1, 0)) as NullValuesInYear,
sum(if(Day_Name is null, 1, 0)) as NullValuesInDayName,
sum(if(Revenue is null, 1, 0)) as NullValuesInRevenue
from `Revenue Data 1`;
-- lets look at Revenue Data 2
-- null values in Revenue Column
select
sum(IF(Date is null, 1, 0)) as NullValuesDateColumn,
sum(if(Week_ID is null, 1, 0)) as NullValuesInWeekID,
sum(if(`Month Number` is null, 1, 0)) as NullValuesInMonthNumber,
sum(if(Month_ID is null, 1, 0)) as NullValuesInMonthID,
sum(if(YEAR is null, 1, 0)) as NullValuesInYear,
sum(if(Day_Name is null, 1, 0)) as NullValuesInDayName,
sum(if(Revenue is null, 1, 0)) as NullValuesInRevenue
from `Revenue Data 2`;

Now after running this query you will observe that there are null values in some columns. There are some null text values which will be deleted. Those null values will be deleted and the query below does exactly that.

-- after running this query, there are null text values which will be deleted
select * from `Revenue Data 2`;
-- QUERY THAT DELETES TEXT NULL VALUES
delete from `Revenue Data 2` where Week_ID = 'NULL';

-- query to look at the dataset
select * from `Revenue Data 2`;

Now lets look at the tables again

-- now lets look at the tables again
select * from `Revenue Data 1`;
select * from `Revenue Data 2`;

Now let’s create a new table joining both revenue datasets.

-- now we create a new table join both revenue data based on Date column
create table `FinalRevenueData` select * from `Revenue Data 1` union all select * from `Revenue Data 2`;
select * from FinalRevenueData;

Now let’s check null values in FinalRevenueData.

-- now lets check if there null values in this final revenue data
-- null values in weekid, monthnumber, monthid, day name, revenue column
select
sum(IF(Date is null, 1, 0)) as NullValuesDateColumn,
sum(if(Week_ID is null, 1, 0)) as NullValuesInWeekID,
sum(if(`Month Number` is null, 1, 0)) as NullValuesInMonthNumber,
sum(if(Month_ID is null, 1, 0)) as NullValuesInMonthID,
sum(if(YEAR is null, 1, 0)) as NullValuesInYear,
sum(if(Day_Name is null, 1, 0)) as NullValuesInDayName,
sum(if(Revenue is null, 1, 0)) as NullValuesInRevenue
from `FinalRevenueData`;

Lets look at marketing and Visitors data.

-- now lets look at Marketing data and Visitors data
desc `Marketing Data`;
desc `Visitors Data`;

-- now it can be observed that there are null values in last three rows of Marketing and Visitors data
select * from `Marketing Data`;
select * from `Visitors Data`;

Now let’s check for the null values in Visitors data and Marketing data.

-- now lets check for null values
-- 5 null values in visitors column
select
sum(if(Visitors is null, 1, 0)) as NullValuesVisitorsColumn from `Visitors Data`;
-- null values in marketing spend
select
sum(if(`Marketing Spend` is null, 1, 0)) as NullValuesInMarketingSpend,
sum(if(Promo is null, 1, 0)) as NullValuesInPromo from `Marketing Data`;

Now there are some duplicates.

-- now before we can see there are duplicate rows in 130 - 132
select * from FinalRevenueData;
-- we are going to remove it via week id
delete from FinalRevenueData where Week_ID is null;
select * from FinalRevenueData;

Now let’s proceed in joining Marketing and Revenue Data.

-- now lets proceed to joining Marketing and Revenue data
-- create a new table
create table `TheeFinalDataset`
select FinalRevenueData.Date, Week_ID, `Month Number`, Month_ID, Year, Day_Name, Revenue,`Marketing Spend`, Promo, Visitors from FinalRevenueData
inner join `Marketing Data` `M D` on FinalRevenueData.Date = `M D`.Date
inner join `Visitors Data` `V D` on FinalRevenueData.Date = `V D`.Date;

Lets look at the final dataset

select * from TheeFinalDataset;

Now let’s check for null values in TheeFinalDataset.

-- now lets check for null values in TheeFinalDataset
-- there are null values in Revenue, Marketing Spend and Visitors
select
sum(IF(Date is null, 1, 0)) as NullValuesDateColumn,
sum(if(Week_ID is null, 1, 0)) as NullValuesInWeekID,
sum(if(`Month Number` is null, 1, 0)) as NullValuesInMonthNumber,
sum(if(Month_ID is null, 1, 0)) as NullValuesInMonthID,
sum(if(YEAR is null, 1, 0)) as NullValuesInYear,
sum(if(Day_Name is null, 1, 0)) as NullValuesInDayName,
sum(if(Revenue is null, 1, 0)) as NullValuesInRevenue,
sum(if(`Marketing Spend` is null, 1, 0)) as NullValuesInMarketingSpend,
sum(if(Promo is null, 1, 0)) as NullValuesInPromo,
sum(if(Visitors is null, 1, 0)) as NullValuesInVisitors
from TheeFinalDataset;

Now we have a problem, do we impute null values using mysql or python.

We we can use both tools but let’s use python out of popularity, feel free to research and use mysql if you want.

So let’s export the theefinaldataset in a csv file and let’s continue our analysis using python and Jupyter. You can download Anaconda or use Jupyter extension in vscode. Feel free to download repository and run on your computer.

In part 2, we will perform some data visualisations in tableau and understand the return on investment(ROI) to know which promotion works and where to invest more.

PART 2

Now when it comes to BI, we need to what we want out of the data. What questions are we looking to get answered and in this part we will be looking at the Revenue and Marketing . Some questions we might ask ie What promo efforts is most effective and least effective and how much is being invested in each?

Now let’s tableau it, we will be answering some of these questions via dashboards.

Revenue Dashboard

Total Revenue from all promotions is close to £2million. Promotion Blue dominates revenue composing 46% of total revenue standing at £876,514. Promotion Red comes in second and No promo comes in third respectively. Now year to year comparison shows promotion blue has a higher percentage increase of +157.82 increase from 2020 to 2021. This is followed by Promotion Red and No Promo respectively. Now based on this dashboard we can say that Promotion Blue is more effective but is it? We can not conclude based on one dashboard. Now let’s look at Marketing Spend Dashboard to understand how much we spend on each promotion.

Marketing Dashboard

Total Money spent on marketing is £254k. 39% percent of marketing investment which stands at £99k went into Promotion blue followed by No Promo and Promotion red respectively. Now year to year comparison shows a +118% increase in marketing investment in promotion blue from 2020 to 2021. This is followed by No Promo and Promotion red respectively. Now even though we are investing more in No Promo than Promotion red, revenue dashboard indicates we are making more money in from promotion red than no promo. Now the only way to be certain of this is to look at return on investment on all marketing promotions.

Return on Investment

Return on investment is used to evaluate the efficiency of investment or compare the efficiency of different investments. Now 0% ROI means no profit and no loss, 100% ROI means spend X and earned 2X in revenue. 1000% ROI means you spent X and earned 11X in revenue. -100% ROI means you spent X and earned 0 in revenue. The higher the percentage the better the investment.

Return on Investment Dashboard

ROI dashboard shows a strong rate of return indicating resources are being used efficiently. Promotion blue has a higher ROI indicating we are receiving 782% return in revenue, followed by promotion red and no promo. Now we need to determine how well marketing promotions have done over years. Now looking at Promotion blue we can see an increase in Return on investment from 686% to 826%. Promotion red and no promo however recorded a decrease in return on investment. No Promo recorded a -15.79% reduction in return on investment from 2020 to 2021 and Promotion red recorded a -2.50% reduction in return on investment from 2020 to 2021. Promotion blue recorded +20.49% increase in return on investment from 2020 to 2021 meaning promotion blue is a solid investment and all efforts and investments should be focused on promotion blue.

Now the data can be explored further to reveal average daily marketing spend, revenue etc. Feel free to clone the repository from my GitHub page: https://github.com/mensalytics.

See ya

--

--

amoakomensa

Engineering and Analytics and some random life thoughts