Bakery Sales Analysis

Anggi Setyawan Riyadi
7 min readJan 22, 2024

--

This dataset comprises the daily transactions of a French bakery’s customers from January 1, 2021, to September 30, 2022, totaling 234,005 entries across 6 columns. These columns include order date, time sequence, transaction ticket number, product name (in French), quantity sold, and unit price.

Taking into account both annual and weekly seasonal patterns, the primary objective of this dataset is sales prediction. The aim is to facilitate bakery production planning. It’s noteworthy that products labeled as “COUPE” indicate customers’ requests for slicing their whole bread using a machine. In a general overview, the focus is on data analysis to enhance the efficiency and productivity of this French bakery.

image from Rasheed Khamis

Analysis Using Python And Visualization With Tableau

Load Dataset

import pandas as pd
import numpy as np

df = pd.read_csv('D:Bakery sales.csv')
df

output:

dataset

In the output above, the dataset comprises 234,005 rows and 7 columns.

Explore Data

df.info()

output:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 234005 entries, 0 to 234004
Data columns (total 6 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 date 234005 non-null object
1 time 234005 non-null object
2 ticket_number 234005 non-null float64
3 article 234005 non-null object
4 Quantity 234005 non-null float64
5 unit_price 234005 non-null object
dtypes: float64(2), object(4)
memory usage: 10.7+ MB

The displayed output represents a Pandas DataFrame with 234,005 entries organized into 6 columns. These columns include “date” and “time” for temporal information, “ticket_number” as a unique identifier, “article” describing the item, “Quantity” indicating the amount, and “unit_price” denoting the cost per unit.

One notable point is the “unit_price” column being labeled as an object, potentially requiring further examination. The output also provides details on non-null counts and data types for each column, offering a concise summary of the dataset’s structure.

# Convert the 'date' column to the datetime data type
df['date'] = pd.to_datetime(df['date'])
df.info()

ouput:

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 234005 entries, 0 to 234004
Data columns (total 6 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 date 234005 non-null datetime64[ns]
1 time 234005 non-null object
2 ticket_number 234005 non-null float64
3 article 234005 non-null object
4 Quantity 234005 non-null float64
5 unit_price 234005 non-null object
dtypes: datetime64[ns](1), float64(2), object(3)
memory usage: 10.7+ MB

The code snippet uses Pandas to convert the ‘date’ column in a DataFrame (‘df’) to the datetime data type. This is done with the pd.to_datetime() function, which transforms date strings into datetime objects. The subsequent df.info() provides a brief summary of the DataFrame, confirming the successful conversion.

# Check missing Value
df.isnull().sum()

output:

date 0
time 0
ticket_number 0
article 0
Quantity 0
unit_price 0
dtype: int64

This observation indicates that all variables in the dataset have complete and available data without any instances of missing values.

#Check value in articel
df.article.unique()

output:

array(['BAGUETTE', 'PAIN AU CHOCOLAT', 'PAIN', 'TRADITIONAL BAGUETTE',
'CROISSANT', 'BANETTE', 'BANETTINE', 'SPECIAL BREAD', 'COUPE',
'SAND JB EMMENTAL', 'KOUIGN AMANN', 'BOULE 200G', 'BOULE 400G',
'GAL FRANGIPANE 6P', 'CAMPAGNE', 'MOISSON', 'CAFE OU EAU',
'BRIOCHE', 'CEREAL BAGUETTE', 'SEIGLE', 'COMPLET',
'DIVERS PATISSERIE', 'GAL FRANGIPANE 4P', 'COOKIE', 'FICELLE',
'PAIN AUX RAISINS', 'GAL POMME 6P', 'GAL POMME 4P', 'FINANCIER X5',
'VIK BREAD', 'DIVERS VIENNOISERIE', 'GACHE', 'SANDWICH COMPLET',
'PAIN BANETTE', 'GRAND FAR BRETON', 'QUIM BREAD',
'SPECIAL BREAD KG', 'GD KOUIGN AMANN', 'BOULE POLKA',
'DEMI BAGUETTE', 'CHAUSSON AUX POMMES', 'BAGUETTE GRAINE',
'DIVERS CONFISERIE', 'SUCETTE', 'DIVERS BOULANGERIE',
'BOISSON 33CL', 'PATES', 'FORMULE SANDWICH', 'DIVERS SANDWICHS',
'CROISSANT AMANDES', 'PAIN CHOCO AMANDES', 'SACHET VIENNOISERIE',
'NANTAIS', 'CHOCOLAT', 'PAIN S/SEL', 'FONDANT CHOCOLAT',
'GAL POIRE CHOCO 6P', 'GAL POIRE CHOCO 4P', 'GALETTE 8 PERS',
'SAND JB', 'SACHET DE CROUTON', 'GRANDE SUCETTE', 'DEMI PAIN',
'TARTELETTE', 'FLAN', 'PARIS BREST', 'SAVARIN', 'FLAN ABRICOT',
'BAGUETTE APERO', 'MILLES FEUILLES', 'CHOU CHANTILLY', 'ECLAIR',
'ROYAL 4P', 'TARTE FRUITS 6P', 'TARTE FRUITS 4P', 'NOIX JAPONAISE',
'THE', 'BRIOCHETTE', 'ROYAL 6P', 'ECLAIR FRAISE PISTACHE', '.',
'GD FAR BRETON', 'TRIANGLES', 'TROPEZIENNE',
'TROPEZIENNE FRAMBOISE', 'ROYAL', 'TARTE FRAISE 6P',
'TARTELETTE FRAISE', 'TARTE FRAISE 4PER', 'FRAISIER',
'NID DE POULE', 'TARTELETTE CHOC', 'PAIN DE MIE', 'CRUMBLE',
'FINANCIER', 'DIVERS BOISSONS', 'CAKE', 'VIENNOISE', 'TRAITEUR',
'PAIN GRAINES', 'PLATPREPARE6,50', 'PLATPREPARE5,50',
'PLATPREPARE7,00', 'FORMULE PLAT PREPARE', 'ST HONORE', 'BROWNIES',
'RELIGIEUSE', 'PLATPREPARE6,00', 'DELICETROPICAL',
'CRUMBLECARAMEL OU PISTAE', 'PT NANTAIS', 'GD NANTAIS',
'DOUCEUR D HIVER', 'TROIS CHOCOLAT', 'ARTICLE 295', 'TARTE FINE',
'ENTREMETS', 'BRIOCHE DE NOEL', 'FRAMBOISIER', 'BUCHE 4PERS',
'BUCHE 6PERS', 'GD PLATEAU SALE', 'BUCHE 8PERS', 'PT PLATEAU SALE',
'REDUCTION SUCREES 12', 'PAIN NOIR', 'REDUCTION SUCREES 24',
'BOTTEREAU', 'MERINGUE', 'PALMIER', 'PAILLE', 'PLAT 6.50E',
'PLAT 7.60E', 'PLAT 7.00', 'PLAT', 'PLAT 8.30E', 'FORMULE PATE',
'GUERANDAIS', 'PALET BRETON', 'CARAMEL NOIX', 'MACARON',
'12 MACARON', 'ARMORICAIN', 'PLAQUE TARTE 25P', 'SABLE F P',
'PAIN SUISSE PEPITO', 'TULIPE', 'TARTELETTE COCKTAIL',
'SACHET DE VIENNOISERIE'], dtype=object)

Above is the unique value of the article row

Preprocessing

# Change the unit_price column so that it can be calculated
df['unit_price']=df['unit_price'].apply(lambda x: float(x.replace(',', '.').replace('€', '').strip()))
df
updating unit_price columns

In updating the ‘unit_price’ column, we transformed the values within the column to make them suitable for calculations. The steps involved the removal of commas as thousands separators, replacing the euro symbol (€) with an empty string, and eliminating any surrounding spaces in the values. Subsequently, we converted these values into floating-point numbers.

How much income per month?

# Added new column 'total_price' by multiplying 'Quantity' by 'unit_price'
df['product_income'] = df['Quantity'] * df['unit_price']

# Group data by month and calculate total monthly income
monthly_income = df.groupby(df['date'].dt.to_period("M"))['unit_price'].sum()
monthly_income

# Reset index in monthly_income data
monthly_income = pd.DataFrame(monthly_income)
monthly_income = monthly_income.reset_index()
monthly_income

output:

no date unit_price
0 2021-01 11636.27
1 2021-02 11758.05
2 2021-03 14214.45
3 2021-04 17274.35
4 2021-05 19825.25
5 2021-06 17078.50
6 2021-07 28167.20
7 2021-08 31551.65
8 2021-09 16021.40
9 2021-10 16126.55
10 2021-11 13108.45
11 2021-12 13447.60
12 2022-01 12139.00
13 2022-02 12998.55
14 2022-03 14636.34
15 2022-04 18017.85
16 2022-05 21223.95
17 2022-06 18245.55
18 2022-07 31272.32
19 2022-08 34492.15
20 2022-09 15726.40

In this data analysis, a new column ‘product_income’ was created by multiplying the ‘Quantity’ and ‘unit_price’ columns. The data was then grouped by month, and the total monthly income was calculated. The resulting output provides a clear overview of the monthly unit prices, showcasing trends and fluctuations over the specified time period from January 2021 to September 2022. This type of analysis is valuable for understanding revenue patterns and making informed business decisions.

What month is the highest and lowest income?

# Sorting the monthly_income dataframe by 'unit_price' in descending order
monthly_income.sort_values(by='unit_price', ascending=False)

output:

no date unit_price
19 2022-08 34492.15
7 2021-08 31551.65
18 2022-07 31272.32
6 2021-07 28167.20
16 2022-05 21223.95
4 2021-05 19825.25
17 2022-06 18245.55
15 2022-04 18017.85
3 2021-04 17274.35
5 2021-06 17078.50
9 2021-10 16126.55
8 2021-09 16021.40
20 2022-09 15726.40
14 2022-03 14636.34
2 2021-03 14214.45
11 2021-12 13447.60
10 2021-11 13108.45
13 2022-02 12998.55
12 2022-01 12139.00
1 2021-02 11758.05
0 2021-01 11636.27

The output displays the monthly_income dataframe sorted in descending order based on the 'unit_price' column. It highlights the months with the highest income, with August 2022 at the top, followed by August 2021 and July 2022. This provides a quick overview of the months that contributed the most to the total income, assisting in identifying trends and making strategic decisions based on revenue performance over time.

High Product by QTY

# Grouping the dataframe 'df' by 'article' and calculating the total quantity (MPS_qty) for each article
MPS_qty = df.groupby(df['article'])['Quantity'].sum()

# Resetting the index in the MPS_qty data
MPS_qty = pd.DataFrame(MPS_qty)
MPS_qty = MPS_qty.reset_index()

# Sorting the MPS_qty dataframe by 'Quantity' in descending order
MPS_qty.sort_values(by='Quantity', ascending=False)

output:

no article Quantity
140 TRADITIONAL BAGUETTE 117463.0
32 CROISSANT 29654.0
85 PAIN AU CHOCOLAT 25236.0
31 COUPE 23505.0
7 BANETTE 22732.0
... ... ...
91 PAIN NOIR 1.0
3 ARTICLE 295 1.0
112 REDUCTION SUCREES 24 1.0
45 DOUCEUR D HIVER 1.0
100 PLAT 6.50E 0.0


149 rows × 2 columns

The output offers a brief snapshot of the best-selling bakery items, showcasing ‘TRADITIONAL BAGUETTE’ as the top performer with 117,463 units sold. Close behind are ‘CROISSANT’ and ‘PAIN AU CHOCOLAT’ with 29,654 and 25,236 units, respectively. On a contrasting note, products like ‘PLAT 6.50E’ with zero units sold suggest minimal or no impact on sales, emphasizing the importance of focusing on popular bakery items to optimize inventory and drive overall sales strategy.

High Product by income

MPS_cost = df.groupby(df['article'])['product_income'].sum()

# Reset index in monthly_income data
MPS_cost = pd.DataFrame(MPS_cost)
MPS_cost = MPS_cost.reset_index()

MPS_cost.sort_values(by='product_income', ascending=False)

output:

no article product_income
140 TRADITIONAL BAGUETTE 144756.05
57 FORMULE SANDWICH 34710.50
32 CROISSANT 33579.50
85 PAIN AU CHOCOLAT 31100.45
7 BANETTE 24555.15
... ... ...
22 CAKE 3.00
91 PAIN NOIR 0.60
100 PLAT 6.50E 0.00
3 ARTICLE 295 0.00
0 . 0.00


149 rows × 2 columns

The provided output represents a financial summary of bakery product sales. Notably, ‘TRADITIONAL BAGUETTE’ leads in product income with 144,756.05 units, followed by ‘FORMULE SANDWICH’ and ‘CROISSANT’ with 34,710.50 and 33,579.50 units, respectively. On the flip side, certain items like ‘PLAT 6.50E’ and ‘ARTICLE’ show zero income, indicating limited or no contribution to overall sales.

The goal is to make strategic decisions based on sales trends and patterns. The conclusion of this analysis is quite intriguing, revealing that there is a standout product contributing approximately 26% of the total revenue. This star product is none other than the ‘Traditional Baguette’. The popularity of this product presents an opportunity to delve deeper into marketing strategies and potential development to strengthen market share and enhance overall profitability.

Visualization

In the course of conducting this analysis, we take pride in presenting a comprehensive visual report through the Tableau Public platform. The data visualizations we have generated go beyond mere numbers, opening the door to a deeper understanding through dynamic graphical elements. By leveraging Tableau Public, we are committed to transparently and interactively delivering this analysis, enabling stakeholders to effortlessly explore and comprehend every facet of our findings. Visit our Tableau page here to explore further.

klik here

dashboard

--

--

Anggi Setyawan Riyadi

Data enthusiast Anggi Setyawan: Unveiling insights, sharing tips, and exploring the world of data together! | https://anggise2023.github.io/ 🚀 #DataExploration