Data Analysis Project on the Mint Classics Company Database using CRISP-DM Methodology with SQL Workbench

Wandi Ramadan
8 min readOct 31, 2023

Data plays a crucial role in decision making within a company. With numerous datasets at hand, it’s essential to process data in such a way that it becomes valuable. In this project, I will attempt to put into practice how to conduct data analysis from a company’s database. The database used is a fictional one from a classic vehicle and car company known as Mint Classics Company. The project’s goal is to explore the data to provide insights and address issues within the company. I am using SQL Workbench software to facilitate data analysis from the database.

I will explain this project using the CRISP-DM (Cross Industry Standard Process for Data Mining) theory, which is a set of steps used in the data mining process. Here is the application of the CRISP-DM method in this project.

1. Business Understanding

On this step, we need to understand the business needs, such as goals, business planning, and strategies. The company’s objective is to optimize inventory usage to increase revenue. Additionally, the company needs to evaluate customer and employee performance to provide better service. Therefore, there are several topics or issues that we need to explore in this analysis:

  • What is the current state of the company’s inventory? Does the company need to add or reduce the products it currently holds?
  • How have the company’s sales been so far?
  • How much revenue has the company generated? What is the revenue generated by each product sold?
  • What type of customers have purchased the company’s products? What actions need to be taken with regard to the company’s customers?
  • How has the employee performance been so far?

2. Data Understanding

This step is a phase for learning data formats, such as data collection, identifying data issues, and detecting data attributes to be used. In this section, I will display the Mint classic company’s database structure in the form of an EER (Extended Entity-Relationship) Diagram

In the above EER Diagram, it is found that there are 9 tables in the Mint classic company’s database with the following explanations:

  • Warehouses compile data on the company’s warehouses used to store product inventory.
  • Products compile data about the product types, stock quantities, sales quantities, purchase prices, and more.
  • Product lines compile data about the descriptions of each product line they sell.
  • Customers compile data about the company’s customer profiles, such as customer names, addresses, credit limits, and others.
  • Payments compile data about payments made by customers, including payment dates and amounts.
  • Orders compile data regarding customer orders for specific products.
  • Order details compile data about orders for specific products, including the quantity of products ordered and the price of each product.
  • Employees compile data about company employees, including names, addresses, offices, and more.
  • Offices compile data about company office profiles.

3. Data preparation

Data preparation involves preparing data for analysis in SQL Workbench. The database used is a fictional database, created and populated using the “mintclassics” database script. If we open and run this script in SQL Workbench, it will create a database named “mintclassics,” as shown in the image below.

Then reverse engineer the database from the script above, which allows us to create an ER Diagram as shown in the previous image.

4. Modeling

In this step, we determine the mining techniques to be used in accordance with the existing issues. However, I won’t be conducting complex analysis or mining. The analysis in this instance will be relatively straightforward. I will attempt to explore and address the issues presented earlier. In this explanation, I will also provide an overview of the analysis techniques that can be used for more in-depth analysis if desired.

a) Company inventory

We need to understand the current state of the company’s inventory, whether there is need to add or reduce the company’s products. This to ensure the company’s inventory is optimal. In this section, we can view the inventory quantities based on the warehouse, product line, and products.

From the query written above, I aim to obtain the total stock data held by the company based on three categories: warehouses, product lines, and each individual product item. I merge two tables to obtain this data. The query used is INNER JOIN to merge the warehouses table with the product table using the related column, which is the product code. After the table merge, grouping (GROUP BY) is performed based on the warehouse, product line, and productName.

  • Total stock based on each warehouse
  • Total stock based on product line
  • Total stock based on each product item

b) Product sale

The company’s product sales can illustrate the extent of product sales from the company. However, to better connect with the previous issue, I will attempt to compare it with the total stock and the reduction of the company’s inventory.

The query above is intended to obtain data such as the total sales, orders, and inventory for each product. The query used combines the products table with the order details table using a LEFT JOIN, using the related column, which is the product code. Next, I perform sorting (ORDER BY) based on the current inventory. With this data, we can compare which products have a significant stock but relatively few orders or vice versa.

c) Sales trends

In the query above, I aim to obtain data on the number of products sold over time. Therefore, I need to combine tables with a LEFT JOIN between the orders and order details tables. Then sorting (ORDER BY) based on the order date.

This is done to determine the company’s product sales trend. If further analysis is needed, we can attempt to forecast the quantity of product sales in the future. This is to anticipate the amount of stock that the company needs to fulfill.

d) Company revenue

To obtain the total company revenue, I gather company revenue data based on three categories: warehouses, product lines, and individual product items.

From this query, there are three tables that I merge: the products table, order details table, and warehouses table. The related columns used are adjusted according to the tables being merged

  • Total revenue based on each warehouse
  • Total revenue based on each product line
  • Total revenue based on product

From the data obtained above, more in-depth analysis can be conducted, such as assessing the impact of the price of each product (priceEach) on the quantity of orders using regression or association analysis.

e) Customer

Customer data is necessary for the company to evaluations and determine what actions need to be taken with customers in the future.

In this query, I’m trying to obtain the company’s customer profiles, including customer number, customer name, country, credit limit, total orders, and total payments. I combine three tables: customers, payments, and orders using JOIN. Then, I perform grouping based on customer number and sorting based on total payment.

f) Employee Performance

Company needs to evaluate each employee to assess the extent to which they contribute to the company. In this section, I retrieve employee profiles along with employee performance, which is assessed based on their total sales.

From this query, I obtain the data on the number of employees in each office. This data is obtained by merging the employees and offices tables. To gauge employee performance, I merge the employees, customers, and orders tables. From these three tables, I obtain employee profiles with the total sales they have completed.

  • Number of employees
  • Employee performance

5. Evaluation

In evaluation step, I won’t provide an evaluation of the analysis techniques used, as I reiterate that the analysis techniques employed were primarily for data exploration. However, in this stage, I will present some findings :

  • There are warehouses with a significant product stock but generate relatively low revenue.
  • Some products have a high stock quantity but low sales or vice versa.
  • Certain product lines have substantial stock quantities but yield low revenue.
  • There are customers with good payments, but there are also customers with payment issues.
  • Some employees exhibit high sales performance, while others have lower sales performance.

6. Deployment

In deployment step, the task is to present the findings. Therefore, I will attempt to recommend some policies that the company may consider, including:

  • Evaluating which products should be added and which should be reduced
  • Promoting in the countries with the highest order rates
  • Adjusting product prices so that the products can be sold
  • Implementing policies for customers with payment issues
  • Recognizing employees with good performance and providing training for those with lower performance

In closing, if you’d like to see a summary of projects I’ve worked on, please visit my GitHub.

--

--