Database Management System for Used Car Marketplace
Project’s objectives
Many areas in Indonesia does not have efficient and reliable public transportation, its leading people to prefer using private transportation such as four-wheeled vehicles or two-wheeled vehicles. However, this project specifically concentrates on four-wheeled vehicles or cars. In line with the inadequate public transportation in Indonesia, the demand for cars is increasing, and buying and selling used cars has become quite popular for people who need a four-wheeled vehicle but have a limited budget. In this digital era, the online buying and selling of used cars provide sellers to advertise widely through a platform and connects with potential buyers. This project focuses on establishing database for the sale of used cars.
The aim of this article is to design a database system for selling the used cars with well-organized structure.
Designing The Database
In designing the database, the first step is to determine the mission statement. The mission statement for this project is as follows:
- Users can advertise used cars.
- Users can place bids on the advertised cars.
This project has some limitations, including:
- Each user of the application can advertise more than one product.
- Every user can search for offered cars based on the seller’s location, car brand, and body type.
- If a potential buyer is interested in a car, they can bid on the product if the seller allows the bidding feature.
- Purchase transactions are conducted outside the application and are not within the project scope.
After that, we create tables related to the e-library system and determine their relationships through the definition of primary keys and foreign keys. The following is the ERD that I made, with a description of each table.
cities
table : this table stores information about locations. It has a primary key (PK) in columncity_id
, that uniquely indentifying each city. it also hascity
columns for the name of city, columnlatitude
andlongitude
for geographic coordinate.products
table : this table stores information about the the product or cars. It has a primary key (PK) in columnproduct_id
, that uniquely indentifying each products. Others columns aremerk
,model
,body_type
,car_type
,years
andprice
to store informations about product’s data.users
table : this table stores information about the the user of this platform. It has a primary key (PK) in columnuser_id
, that uniquely indentifying each users. It also hascity_id
as foreign key (FK) referencing fromcity_id
incities
table. Others columns arefirst_name
,last_name
andphone
to store informations about user’s data.advertisements
table : this table stores information about the advertisement that has been posted by the user. It has a primary key (PK) in columnads_id
, that uniquely indentifying each ads. It also hasproduct_id
as foreign key (FK) referencing fromproduct_id
inproducts
table,user_id
as foreign key (FK) referencing fromuser_id
inusers
table,title
columns contains the title of the ads,price
for the price of the products anddate_post
for time the ads posted.bids
table : this table stores information about the bids from user as potensial buyers of the products. It has a primary key (PK) in columnbid_id
, that uniquely indentifying each bids from users. It also hasuser_id
as foreign key (FK) referencing fromuser_id
inusers
table,product_id
as foreign key (FK) referencing fromproduct_id
inproducts
table,date_bid
columns for the time the bid is given,bid_price
for price for bid that has been submited by the user ,bid_status
for the status of bid, it contains sent or failed.
Implementing The Design
To implementing the design of the database or ERD using PostgreSQL, we can use Data Definition Language (DDL) syntax to generate the table and define the relationship. here are the steps in implementing the database design.
- Connect to PostgreSQL Database
- Create database
CREATE TABLE users
(
user_id SERIAL PRIMARY KEY,
first_name VARCHAR(50) NOT NULL,
last_name VARCHAR(50) NOT NULL,
city_id INTEGER NOT NULL,
phone VARCHAR(25) NOT NULL,
CONSTRAINT fk_city
FOREIGN KEY (city_id)
REFERENCES cities(city_id)
);
this query is for creating table users. This is a example for table that has a foreign key (FK). full Table Structure can be seen on this page
Populating the database
To retrieve data from the database, data in the form of tables is required. Therefore, I use dummy data obtained by combining data from Excel and the Python library Faker. Here is an example of the code:
#import library
from faker import Faker
import pandas as pd
import random
from datetime import datetime, timedelta
The first step is to import the libraries that will be used, such as:
pandas
: Used for data manipulation and analysis in Python.Faker
: A library used for generating fake data.random
: Used to generate random values.timedelta
: Used to calculate time.
Next, set the instance localization to Indonesia to obtain fake data such as names and phone number.
#set localization into Indonesia
fake = Faker('id_ID')
this step is creating the cities DataFrame. In this step, I am using data that already available in xlsx format.
# read the excel file
loc_data = pd.read_excel('city.xlsx')
loc_data.head
def cities(loc_data):
"""
create a DataFrame for cities.
Parameters:
- loc_data (pd.DataFrame): DataFrame containing location data.
Returns:
- pd.DataFrame: DataFrame containing city information including city_id,
city name, latitude, and longitude.
"""
city = {
"city_id": [],
"city": [],
"latitude": [],
"longitude": []
}
n_city = len(loc_data)
for i in range(n_city):
post_code = fake.postcode()
city["city_id"].append(loc_data['kota_id'].iloc[i])
city["city"].append(loc_data['nama_kota'].iloc[i])
city["latitude"].append(loc_data['latitude'].iloc[i])
city["longitude"].append(loc_data['longitude'].iloc[i])
city_df = pd.DataFrame(city)
return city_df
city_df = cities(loc_data)
print(city_df)
this step is creating the users DataFrame using faker. This DataFrame containing user information such as user_id, first_name, last_name, city_id and email.
def users(n_users, city_df):
"""
create a DataFrame for users.
Parameters:
- n_users (int): number of users.
- city_df (pd.DataFrame): DataFrame containing city information.
Returns:
- pd.DataFrame: DataFrame containing user information including user_id,
first_name, last_name, city_id, and phone.
"""
users = {
"user_id": [i + 1 for i in range(n_users)],
"first_name": [],
"last_name": [],
"city_id": [],
"phone": []
}
for i in range(n_users):
first_name = fake.first_name()
last_name = fake.last_name()
city_id = random.choice(city_df['city_id'])
phone = fake.phone_number()
users["first_name"].append(first_name)
users["last_name"].append(last_name)
users["city_id"].append(city_id)
users["phone"].append(phone)
user_df = pd.DataFrame(users)
return user_df
n_users = 100
users_df = users(n_users, city_df)
print(users_df)
the full code of this dummy data can be seen on this page
After creating all the necessary tables in the database using dummy data, the next step is to import these dummy datasets into the database. Here is how to import to the database:
COPY
cities
FROM
'C:\Users\HP\Downloads\city.csv'
DELIMITER ','
CSV
HEADER;
full importing query can be seen on this page
Retrieve data
To find out if the database can run properly, I create several questions which are divided into transactional queries and analytical queries.
A. Transactional query
- Finding cars manufactured >= 2015
SELECT
product_id,
merk,
model,
years,
price
FROM
products
WHERE years >= 2015
ORDER BY 4;
2. Viewing all cars sold by ‘Wasis Januar’
SELECT
p.product_id,
p.merk,
p.model,
p.years,
p.price,
a.date_post
FROM products as p
JOIN advertisements as a
ON p.product_id = a.product_id
JOIN users as u
ON u.user_id = a.user_id
WHERE CONCAT(u.first_name, ' ', u.last_name) = 'Wasis Januar'
ORDER BY 6 DESC;
full transactional query can be seen on this page
B. Analytical query
- Rank the popularity of models based on number of bids
SELECT
p.model,
COUNT(distinct p.product_id) as count_product,
COUNT(b.bid_id) as count_bid
FROM
products as p
LEFT JOIN advertisements as a
ON a.product_id = p.product_id
LEFT JOIN bids as b
ON b.product_id = p.product_id
GROUP BY 1
ORDER BY 3 DESC;
2. Comparing product prices based on the average price per city
SELECT
c.city,
p.merk,
p.model,
p.years,
p.price,
AVG(price) OVER(PARTITION BY city) AS avg_city_price
FROM advertisements as a
LEFT JOIN users as u
ON u.user_id = a.user_id
LEFT JOIN cities as c
ON u.city_id = c.city_id
LEFT JOIN products as p
ON a.product_id = p.product_id
ORDER BY 1;
full analytical query can be seen on this page
Thank you for taking the time to read my article!
References