The DAP Journey: Alcohol Analytics

A study of liquor sales in Iowa

Kaihui
SMUBIA
5 min readDec 31, 2019

--

Our group and mentor! :)

In this Medium series, BIA extracts the introspection of our Data Associates as they recall their academic exploration. This post features an analytics project on visual analysis and forecasting, directed by Kai Hui, Zi Feng and En Shing, mentored by Wa Thone.

Introduction

“I hear and I forget. I see and I remember. I do and I understand”.

As part of the SMU Business Intelligence and Analytics (BIA) Club’s Data Analytics Fast Track Programme, our team embarked on a 4-month experiential project. We collaborated on a data visualization and forecasting project examining Iowa liquor sales between 2012 to 2019.

Initially, we obtained the required data sets from Google BigQuery using SQL, which amounted to a total of 31 CSV files (3.65GB).

Snapshot of our data set

Aim and Approach

Our team had two key goals for this project.

Firstly, we wanted to study general trends of alcohol sales in Iowa, creating a dashboard on Tableau to visualize key metrics.

Second, we aimed to build a forecasting model predicting future liquor sales in specific cities based on the historical data we obtained. This will be done through SARIMA modelling and the Prophet package on Python 3.

Project Deliverables

Data Cleaning

Using a combination of Python and Tableau, we managed to import, clean, analyze, train, test, predict, and visualize the data set to facilitate effective communication.

Analytics Dashboard (Tableau)

We had the opportunity to be part of SMU BIA’s inaugural Tableau Certification workshop and be certified as a Tableau Desktop Specialist. Using the knowledge attained from the program, we ventured and introduced easy-to-understand visualizations to conduct our data analysis. We eventually created an interactive dashboard for users to explore the trends in Iowa liquor sales. This means we can easily obtain analytical insights, such as the store with the highest liquor sales.

Snapshot of our interactive Tableau Dashboard

Forecasting Models (Python 3)

We explored using SARIMA (Seasonal Autoregressive Integrated Moving Average), VAR (vector autoregressions)and Prophet models to conduct our forecasting of alcohol sales. Ultimately, only the SARIMA model gave reasonable results for some of the forecasts.

One of the outputs from our SARIMA model

Challenges Faced

Like any other data analytics project, we faced many challenges in our journey. Some of these include:

Missing data. Some fields were lacking value which hindered our analysis. We resolved this during our data cleaning process by researching the missing information and included it in our final data set. We also picked up geocoding to impute the missing longitude and latitude values in the data frame.

Geocoding to replace missing longitude & latitude values
Geocoding to replace missing longitude & latitude values

Tracebacks and unprocessable code. We were at times unsure of the codes’ capabilities to perform data cleaning, analysis, and forecasting. However, we overcame it by combing through various online resources (thank you StackOverflow!), learning as best we can and adapting the solutions.

Sunk Cost Fallacy. The outcome of a data analytics project will always be uncertain in the beginning. However, we have already committed weeks into the data cleaning, code writing and building our forecasting model. Hence, even though we generated less-than-ideal predictions, we still stuck through with the results and found it difficult to pivot.

Incorrect forecasting models used. Our hypothesis that wine sales were affected by its previous months’ sales might not have held, which could be a reason why some of our forecasts were inaccurate (worse than the sample mean). In the future, we can look to explore other forecasting models apart from pure time-series forecasts.

Final Reflections

“A failure is like fertilizer; it stinks to be sure, but it makes things grow faster in the future.”

Our project might not have given us the most ideal results we could hope for, but we have gained tremendously from our journey. Through this project, we have acquired skills like

  • geocoding
  • building interactive dashboards using Tableau &
  • developing deeper proficiency coding with Python.

At the end of the day, we honed many crucial skills and improved as Data Associates compared to when we first began. We are confident of being able to better tackle the next data analytics project we will undertake.

Till our next data analytics endeavour, cheers! 🥃

Is DAP worth the shot? Let’s hear it from our Data Associates

Kai Hui, SOSS

“The DAP journey has truly been an extremely fulfilling one. Within a short span of a few months, I gained a deeper understanding of data science, as well as applying these skills to real-world projects. However, what I cherish most is the dedicated and supportive community. It has been a privilege to be part of SMUBIA and DAFTP 2019.”

Zi Feng, SOE

“Having the opportunity to work on a project forced me to really understand and explore viable machine learning concepts, which bolstered my knowledge, especially in forecasting. Also, I learnt many data cleaning techniques which I would not have come across without this project.”

En Shing, SOB

“The DAP gave me the opportunity to explore various analytical methods with like-minded and talented individuals across SMU. The program gave us a safe and nuturing platform to try things out, with no fear of failure.”

--

--