Database Management System for Used Car Marketplace

Virga Novayanti
7 min readJan 7, 2024
image by verazinha on freepik

Project’s objectives

Many areas in Indonesia does not have efficient and reliable public transportation, its leading people to prefer using private transportation such as four-wheeled vehicles or two-wheeled vehicles. However, this project specifically concentrates on four-wheeled vehicles or cars. In line with the inadequate public transportation in Indonesia, the demand for cars is increasing, and buying and selling used cars has become quite popular for people who need a four-wheeled vehicle but have a limited budget. In this digital era, the online buying and selling of used cars provide sellers to advertise widely through a platform and connects with potential buyers. This project focuses on establishing database for the sale of used cars.

The aim of this article is to design a database system for selling the used cars with well-organized structure.

Designing The Database

In designing the database, the first step is to determine the mission statement. The mission statement for this project is as follows:

  • Users can advertise used cars.
  • Users can place bids on the advertised cars.

This project has some limitations, including:

  • Each user of the application can advertise more than one product.
  • Every user can search for offered cars based on the seller’s location, car brand, and body type.
  • If a potential buyer is interested in a car, they can bid on the product if the seller allows the bidding feature.
  • Purchase transactions are conducted outside the application and are not within the project scope.

After that, we create tables related to the e-library system and determine their relationships through the definition of primary keys and foreign keys. The following is the ERD that I made, with a description of each table.

  1. cities table : this table stores information about locations. It has a primary key (PK) in column city_id, that uniquely indentifying each city. it also has city columns for the name of city, columnlatitude and longitude for geographic coordinate.
  2. products table : this table stores information about the the product or cars. It has a primary key (PK) in column product_id, that uniquely indentifying each products. Others columns are merk , model , body_type , car_type , years and price to store informations about product’s data.
  3. users table : this table stores information about the the user of this platform. It has a primary key (PK) in column user_id, that uniquely indentifying each users. It also has city_id as foreign key (FK) referencing from city_idin citiestable. Others columns are first_name , last_name and phone to store informations about user’s data.
  4. advertisements table : this table stores information about the advertisement that has been posted by the user. It has a primary key (PK) in column ads_id, that uniquely indentifying each ads. It also has product_id as foreign key (FK) referencing from product_id in productstable, user_id as foreign key (FK) referencing from user_id in userstable, title columns contains the title of the ads, price for the price of the products anddate_post for time the ads posted.
  5. bids table : this table stores information about the bids from user as potensial buyers of the products. It has a primary key (PK) in column bid_id, that uniquely indentifying each bids from users. It also has user_id as foreign key (FK) referencing from user_id in userstable, product_id as foreign key (FK) referencing from product_id in productstable, date_bid columns for the time the bid is given, bid_price for price for bid that has been submited by the user , bid_status for the status of bid, it contains sent or failed.

Implementing The Design

To implementing the design of the database or ERD using PostgreSQL, we can use Data Definition Language (DDL) syntax to generate the table and define the relationship. here are the steps in implementing the database design.

  1. Connect to PostgreSQL Database
  2. Create database
CREATE TABLE users
(
user_id SERIAL PRIMARY KEY,
first_name VARCHAR(50) NOT NULL,
last_name VARCHAR(50) NOT NULL,
city_id INTEGER NOT NULL,
phone VARCHAR(25) NOT NULL,
CONSTRAINT fk_city
FOREIGN KEY (city_id)
REFERENCES cities(city_id)
);

this query is for creating table users. This is a example for table that has a foreign key (FK). full Table Structure can be seen on this page

Populating the database

To retrieve data from the database, data in the form of tables is required. Therefore, I use dummy data obtained by combining data from Excel and the Python library Faker. Here is an example of the code:

#import library
from faker import Faker
import pandas as pd
import random
from datetime import datetime, timedelta

The first step is to import the libraries that will be used, such as:

  • pandas: Used for data manipulation and analysis in Python.
  • Faker: A library used for generating fake data.
  • random: Used to generate random values.
  • timedelta: Used to calculate time.

Next, set the instance localization to Indonesia to obtain fake data such as names and phone number.

#set localization into Indonesia
fake = Faker('id_ID')

this step is creating the cities DataFrame. In this step, I am using data that already available in xlsx format.

# read the excel file 
loc_data = pd.read_excel('city.xlsx')
loc_data.head
def cities(loc_data):
"""
create a DataFrame for cities.

Parameters:
- loc_data (pd.DataFrame): DataFrame containing location data.

Returns:
- pd.DataFrame: DataFrame containing city information including city_id,
city name, latitude, and longitude.
"""
city = {
"city_id": [],
"city": [],
"latitude": [],
"longitude": []
}

n_city = len(loc_data)

for i in range(n_city):
post_code = fake.postcode()
city["city_id"].append(loc_data['kota_id'].iloc[i])
city["city"].append(loc_data['nama_kota'].iloc[i])
city["latitude"].append(loc_data['latitude'].iloc[i])
city["longitude"].append(loc_data['longitude'].iloc[i])

city_df = pd.DataFrame(city)
return city_df

city_df = cities(loc_data)
print(city_df)

this step is creating the users DataFrame using faker. This DataFrame containing user information such as user_id, first_name, last_name, city_id and email.

def users(n_users, city_df):
"""
create a DataFrame for users.

Parameters:
- n_users (int): number of users.
- city_df (pd.DataFrame): DataFrame containing city information.

Returns:
- pd.DataFrame: DataFrame containing user information including user_id,
first_name, last_name, city_id, and phone.
"""
users = {
"user_id": [i + 1 for i in range(n_users)],
"first_name": [],
"last_name": [],
"city_id": [],
"phone": []
}

for i in range(n_users):
first_name = fake.first_name()
last_name = fake.last_name()
city_id = random.choice(city_df['city_id'])
phone = fake.phone_number()

users["first_name"].append(first_name)
users["last_name"].append(last_name)
users["city_id"].append(city_id)
users["phone"].append(phone)

user_df = pd.DataFrame(users)
return user_df

n_users = 100
users_df = users(n_users, city_df)
print(users_df)

the full code of this dummy data can be seen on this page

After creating all the necessary tables in the database using dummy data, the next step is to import these dummy datasets into the database. Here is how to import to the database:

COPY
cities
FROM
'C:\Users\HP\Downloads\city.csv'
DELIMITER ','
CSV
HEADER;

full importing query can be seen on this page

Retrieve data

To find out if the database can run properly, I create several questions which are divided into transactional queries and analytical queries.

A. Transactional query

  1. Finding cars manufactured >= 2015
SELECT
product_id,
merk,
model,
years,
price
FROM
products
WHERE years >= 2015
ORDER BY 4;

2. Viewing all cars sold by ‘Wasis Januar’


SELECT
p.product_id,
p.merk,
p.model,
p.years,
p.price,
a.date_post
FROM products as p
JOIN advertisements as a
ON p.product_id = a.product_id
JOIN users as u
ON u.user_id = a.user_id
WHERE CONCAT(u.first_name, ' ', u.last_name) = 'Wasis Januar'
ORDER BY 6 DESC;

full transactional query can be seen on this page

B. Analytical query

  1. Rank the popularity of models based on number of bids
SELECT 
p.model,
COUNT(distinct p.product_id) as count_product,
COUNT(b.bid_id) as count_bid
FROM
products as p
LEFT JOIN advertisements as a
ON a.product_id = p.product_id
LEFT JOIN bids as b
ON b.product_id = p.product_id
GROUP BY 1
ORDER BY 3 DESC;

2. Comparing product prices based on the average price per city

SELECT
c.city,
p.merk,
p.model,
p.years,
p.price,
AVG(price) OVER(PARTITION BY city) AS avg_city_price
FROM advertisements as a
LEFT JOIN users as u
ON u.user_id = a.user_id
LEFT JOIN cities as c
ON u.city_id = c.city_id
LEFT JOIN products as p
ON a.product_id = p.product_id
ORDER BY 1;

full analytical query can be seen on this page

Thank you for taking the time to read my article!

References

  1. https://github.com/virganov/used_car/tree/main

--

--