Using Waterfall Charts in Python to analyze Iowa Liquor change
GitHub
https://github.com/melodyyip/Iowa_liquor_sale_analysis_waterfallchart
Waterfall chart shows a running total as values are added or subtracted. It’s useful for understanding how an initial value is affected by a series of positive and negative values.
The columns are color coded so you can quickly tell positive from negative numbers. The initial and the final value columns often start on the horizontal axis, while the intermediate values are floating columns. Because of this “look”, waterfall charts are also called bridge charts.
Data
I used the Iowa Liquor Sales data from Jan 2016 to Jan 2022 only for the sale analysis.
Column Descriptions
Goal
On the basis of Sales data, I am going to find the answer to the following question
1. What is the overall sales trend?
2. What is the Monthly growth?
3. What is the overall YoY growth?
4. What is the YoY growth by liquor category?
Q1. What is the overall sales trend?
import plotly.express as px
dfg = df_totalsale[['date','sale_dollars']]
dfg = dfg.groupby(pd.Grouper(key='date',freq='M')).sum().reset_index()
dfg.rename(columns = {"sale_dollars": "sum"}, inplace=True)fig = px.line(dfg, x="date", y="sum"
,hover_data={"date": "|%B %d, %Y"}
,markers=True
,color_discrete_sequence=px.colors.diverging.PRGn
,template = "plotly_white"
)
fig.update_layout(
title="Total Sale Over Time(2016-2021)"
,xaxis_title="date"
,yaxis_title="sale_dollars")fig.update_xaxes(
dtick="M1",
tickformat="%b\n%Y",
ticklabelmode="period")fig.show()
import plotly.express as px
dfg = df_totalsale[['year','month','month_','sale_dollars']]
dfg = dfg.groupby(['year','month','month_']).sum().reset_index()
dfg.rename(columns = {"sale_dollars": "sum"}, inplace=True)fig = px.line(dfg, x="month_", y="sum"
,color='year'
,markers=True
,color_discrete_sequence=px.colors.diverging.PRGn
,template = "plotly_white")
fig.update_layout(
title="Total Sale Over Time(2016-2020)"
,xaxis_title="month"
,yaxis_title="sale_dollars")fig.update_xaxes(
dtick="M1",
ticklabelmode="period")fig.show()
The line graph above shows that there has been a steady increase in the sale of Iowa liquor since 2016. There is a seasonality pattern, such as sales are always low at the beginning of the year and high at the end of the year. There is always an upward trend within any single year with a couple of low months in the mid of the year. For the recent year, it seems the customer would like to purchase liquors on March, June and December
Q2. What is the Monthly growth (2021–2022)?
import plotly.graph_objects as gotext = list(MoM_Data['MoMGrowth_text'].values.flatten())
text = [element /1000000 for element in text]fig = go.Figure(go.Waterfall(
name = "20", orientation = "v",
measure = ["relative", "relative", "relative","relative","relative", "relative", "relative", "relative","relative", "relative", "relative", "relative","total"],
x = list(MoM_Data['year_month'].values.flatten()),
textposition = "outside",
text = text,
y = list(MoM_Data['MoMGrowth_amount'].values.flatten()),
connector = {"line":{"color":"rgb(63, 63, 63)"}},
))fig.update_layout(
title = "Monthly Sales Change(2021-2022)",
showlegend = False
)fig.show()
2021’s growth is largely attributed to 7 million sales in March, 5 million sales in June, and 4 million sales in December. In general, these three months contributed to the growth in 2021.
Q3. What is the YoY growth (2020–2021)
The YoY sales growth is calculated by comparing monthly sales between 2020 and 2021. First, the data must be prepared.
I am not going to show the step here, you can check my GitHub for this part.
Check my GitHub to took a look at all my projects🍀
Add a column of incremental changes compare with 2021 by month:
Now, we’re ready to create a waterfall chart:
import plotly.graph_objects as gotext = list(Data['text%'].values.flatten())
text = [round(element,2) for element in text]
text = [str(int)+'%' for int in text]
text.insert(0, "")
text.insert(len(text),"" )
textfig = go.Figure(go.Waterfall(
name = "20", orientation = "v",
measure = ["relative"
,"relative"
,"relative"
,"relative"
,"relative"
,"relative"
,"relative"
,"relative"
,"relative"
,"relative"
,"relative"
,"relative"
,"relative"
,"total"],
x = [['Total'
,'Month'
,'Month'
,'Month'
,'Month'
,'Month'
,'Month'
,'Month'
,'Month'
,'Month'
,'Month'
,'Month'
,'Month'
, 'Total '],
['2021'
,'Jan'
,'Feb'
,'Mar'
,'Apr'
,'May'
,'Jun'
,'Jul'
,'Aug'
,'Sep'
,'Oct'
,'Nov'
,'Dec'
,'2022']],
textposition = "outside",
text = text,
y = y,
connector = {"line":{"color":"rgb(63, 63, 63)"}},
))fig.update_layout(
title = "YoY Liquor Sales Growth from 2020 to 2021",
showlegend = False,
# uniformtext_minsize=9, uniformtext_mode='hide'
width=950, height=700
,yaxis_range=[ly-10000000,cy+4000000]
)fig.show()
In November of 2020, the Iowa reported a 19.4% increase in sale over previous year November. Throughout the year of 2020, there is only September report a sightly reduce on sale by 2.57% compare with the previous year.
Q4. What is the YoY growth by liquor category?
I analysis the sale growth by comparing the sale growth on each liquor category.
Compare with 2020, the 2021 mezcal/tequila sales increased 29.6%, temporary liquor offer increased 12.8% and spirits increased 32.73%.
These are a significant growth in these three type of liquor.
Reference :
https://www.storiesondata.com/post/data-analysis-using-python-sales-analysis