The Journey into Time Series: Beginning with Descriptive Analytics
Data Mastery Series — Episode 11: The Art of Forecasting (Part 2)
If you are interested in articles related to my experience, please feel free to contact me: linkedin.com/in/nattapong-thanngam
In a creative departure from the norm, we’re embarking on an illustrative journey today, weaving a distinctive tale that seamlessly intertwines with our theme of time series forecasting. To make our subject more engaging and accessible, we’re using storytelling and visual aids.
Our narrative revolves around Donato, a regular patron of the Robust Roast coffee shop who enjoys their signature brew, the “Optimizer Shot”. As Zara, the shop owner, notices Donato’s regular visits, she strikes up a friendship. Through their casual conversations, she discovers that Donato is a data science and analytics aficionado. Recognizing the potential of these disciplines to uncover customer behavior insights and enhance profitability, Zara asks Donato to mentor her.
Let’s indulge in a thought experiment. If you were Donato, where would you commence your coaching journey with Zara?
I propose that the optimal launching pad for our data-driven exploration is descriptive analytics. Let’s delve deeper:
What is Descriptive Analytics?
Descriptive analytics is a process that deciphers historical data to identify patterns and trends, essentially answering the question, “What happened?” A critical element of descriptive analytics is Univariate Analysis, which dissects a single variable to elucidate its patterns and distribution, without considering any interactions with other variables.
What are the Objectives/Benefits of Descriptive Analytics?
- It provides valuable insights into past behaviors, helping infer their potential influence on future outcomes.
- It illuminates patterns and trends within the data, enabling more informed decision-making.
- It highlights areas requiring improvement, laying a robust foundation for strategic future planning.
To breathe life into our narrative, I’ve simulated transaction data, a common type of dataset across various industries. The table consists of the transaction date, the SKU that was sold, and the quantity sold.
“To kickstart our exploration of Descriptive Analytics, let’s begin with some fundamental visualization techniques. All of the code and raw file are available in my GitHub repository [Link]
1. Frequency Distribution Tables
These tables display the count of various values or categories in a dataset. Key Components: Resembling a pivot table, the tables comprise columns for the sales volumes of products A, B, and C, with rows representing monthly sales. The values denote the frequency of sales volume for each product at each sales level.
2. Heat Maps
While frequency distribution tables summarize the data, heat maps offer a more comprehensive understanding using color gradients. A heat map showcases the relationship and intensity between various data values using color gradients. It’s like a frequency distribution table but uses color gradients for easier interpretation, effectively revealing patterns, relationships, or correlations within a dataset.
3. Pie Charts
A pie chart is a circular graphic showing the proportion of different categories within a dataset. Each slice of the pie chart represents a category, and the size of the slice indicates the category’s proportion or percentage within the dataset.
4. Bar Charts
A bar chart compares different categories or values using rectangular bars of varying heights. The bar chart can be horizontal or vertical, with each bar’s height representing its value or frequency. It can also illustrate the values from the Frequency Distribution Tables, providing a more detailed understanding of each product’s behavior.
5.Histograms
A histogram showcases the distribution of continuous numerical data by categorizing it into ‘bins’ or intervals. Each bar of the histogram represents a bin, and the bar’s height signifies the frequency within that bin. Adjusting the bin size can provide more insightful visuals (in this case, we assume 15 bins).
6. Kernel Density Estimation (KDE)
KDE plots estimate a continuous variable’s probability density function by smoothing the data with a kernel function. A KDE plot displays a continuous line representing the estimated probability density of the data, providing a smoother visualization of a histogram.
7. Box Plots
A box plot (or box-and-whisker plot) exhibits data distribution through quartiles and identifies outliers. The plot features a box representing the interquartile range (IQR), whiskers indicating the data’s range excluding outliers, and points or lines highlighting outliers.
Here’s how to read a box plot by comparing it with a histogram: The graph overlays a dotted line representing the min, 25th percentile, 50th percentile, 75th percentile, and max values. (Typically, a box plot can show outliers, but this data doesn’t have any.)
Box plots streamline data comparison. While comparing three products using histograms could be complex, box plots simplify this task.
8. Line Charts
A line chart illustrates trends and changes in a variable over time. A line chart connects data points with a line, with the x-axis representing time and the y-axis representing the variable’s values.
“In today’s episode, we’ve discovered the power of descriptive analytics and the revealing nature of data visualization techniques. These tools have allowed us to identify historical patterns and trends in Zara’s coffee shop sales, providing her with a solid foundation for strategic decision-making. All of the code and raw files are available in my GitHub repository [Link]
However, our journey into the realm of data science is just beginning. In our next episode, we’ll delve into inferential statistics, aiming to understand why these patterns occur and unearthing the hidden relationships in Zara’s data. See you next time on our Data Mastery Series!
Thank you for taking the time to read this article! If you found it enjoyable, we recommend checking out these other articles for your reading pleasure.
Please feel free to contact me, I am willing to share and exchange on topics related to Data Science and Supply Chain.
Facebook: facebook.com/nattapong.thanngam
Linkedin: linkedin.com/in/nattapong-thanngam