Database Design for E-Library Application — Sekolah Data Pacmann Project

fariz zulfadhli
7 min readMay 8, 2024

--

The objective of this project is to develop a robust database system for an e-library application that efficiently manages multiple libraries, book collections, user interactions, loans, and holds. The database will support functionalities such as book borrowing, holds management, user registration, and transaction tracking while ensuring data integrity and adherence to business rules.

Designing The Database

The mission is to create a database that enables effective management of an e-library application, ensuring seamless user experiences in borrowing and accessing books across multiple libraries.

Design ERD

  1. Identify Tables and Fields:

Determine the essential entities (tables) needed for the database: Library, Book, User, Loan, and Hold. Define attributes (fields) for each table based on requirements, including primary keys and foreign keys.

2. Establish Relationships:

Define relationships between tables (one-to-many, many-to-many) using foreign keys. Ensure referential integrity to maintain data consistency.

Library to Books:

  • Relationship Type: One-to-Many (One library can have multiple books)
  • Foreign Key: library_id in Books references library_id in Library

User to Loans:

  • Relationship Type: One-to-Many (One user can have multiple loans)
  • Foreign Key: user_id in Loans references user_id in Users

Book to Loans:

  • Relationship Type: One-to-Many (One book can be borrowed multiple times)
  • Foreign Key: book_id in Loans references book_id in Books

User to Holds:

  • Relationship Type: One-to-Many (One user can have multiple holds)
  • Foreign Key: user_id in Holds references user_id in Users

Book to Holds:

  • Relationship Type: One-to-Many (One book can have multiple holds)
  • Foreign Key: book_id in Holds references book_id in Books

3. Implement Business Rules:

Apply constraints (NOT NULL, CHECK) to enforce data validity and business rules within the database schema.

Books:
title: NOT NULL
author: NOT NULL
category: NOT NULL
total_quantity: NOT NULL, CHECK(total_quantity >= 0)
available_quantity: NOT NULL, CHECK(available_quantity >= 0)

Users:
username: NOT NULL
email: NOT NULL
password: NOT NULL
first_name: NOT NULL
last_name: NOT NULL
address: NOT NULL

Loans:
loan_date: NOT NULL
due_date: NOT NULL
return_date: Can be NULL (if book hasn’t been returned yet)

Holds:
hold_date: NOT NULL
expiry_date: NOT NULL (to indicate when the hold expires)

Implementing The Design

The result of this Database Design is an Entity Relationship Diagram (ERD). After creating the ERD, implement the ERD results into the database using PostgreSQL and Data Definition Language (DDL)

  1. Data Definition Language (DDL)
CREATE TABLE Library (
library_id SERIAL PRIMARY KEY,
name VARCHAR(255) NOT NULL,
location VARCHAR(255) NOT NULL
);

CREATE TABLE Books (
book_id SERIAL PRIMARY KEY,
title VARCHAR(255) NOT NULL,
author VARCHAR(255) NOT NULL,
category VARCHAR(255) NOT NULL,
total_quantity INT NOT NULL CHECK(total_quantity >= 0),
available_quantity INT NOT NULL CHECK(available_quantity >= 0),
library_id INT REFERENCES Library(library_id)
);

CREATE TABLE Users (
user_id SERIAL PRIMARY KEY,
username VARCHAR(50) NOT NULL,
email VARCHAR(255) NOT NULL,
password VARCHAR(255) NOT NULL,
first_name VARCHAR(100) NOT NULL,
last_name VARCHAR(100) NOT NULL,
address VARCHAR(255) NOT NULL
);

CREATE TABLE Loans (
loan_id SERIAL PRIMARY KEY,
user_id INT REFERENCES Users(user_id),
book_id INT REFERENCES Books(book_id),
library_id INT REFERENCES Library(library_id),
loan_date DATE NOT NULL,
due_date DATE NOT NULL,
return_date DATE
);

CREATE TABLE Holds (
hold_id SERIAL PRIMARY KEY,
user_id INT REFERENCES Users(user_id),
book_id INT REFERENCES Books(book_id),
library_id INT REFERENCES Library(library_id),
hold_date DATE NOT NULL,
expiry_date DATE NOT NULL
);

2. Entity Relationship Diagram (ERD)

Populating the Database

  1. Create a Dummy Dataset

To populate the database for the e-library application using Python with Faker

!pip install pandas faker

import pandas as pd
from faker import Faker
import random
from datetime import datetime, timedelta

# Initialize Faker to generate fake data
fake = Faker()

# Define the number of records for each table
num_libraries = 3
num_books_per_library = 20
num_users = 50
num_loans = 100
num_holds = 50

# Generate Library Data
libraries_data = {
'library_id': list(range(1, num_libraries + 1)),
'name': [fake.company() for _ in range(num_libraries)],
'location': [fake.city() for _ in range(num_libraries)]
}
libraries = pd.DataFrame(libraries_data)

# Generate Books Data
books_data = []
for library_id in range(1, num_libraries + 1):
for book_id in range(1, num_books_per_library + 1):
title = fake.catch_phrase()
author = fake.name()
category = random.choice(['Fiction', 'Non-Fiction', 'Science Fiction', 'Mystery', 'Romance'])
total_quantity = random.randint(5, 20)
available_quantity = random.randint(1, total_quantity)
books_data.append({
'book_id': len(books_data) + 1,
'title': title,
'author': author,
'category': category,
'total_quantity': total_quantity,
'available_quantity': available_quantity,
'library_id': library_id
})
books = pd.DataFrame(books_data)

# Generate Users Data
users_data = {
'user_id': list(range(1, num_users + 1)),
'username': [fake.user_name() for _ in range(num_users)],
'email': [fake.email() for _ in range(num_users)],
'password': [fake.password() for _ in range(num_users)],
'first_name': [fake.first_name() for _ in range(num_users)],
'last_name': [fake.last_name() for _ in range(num_users)],
'address': [fake.address() for _ in range(num_users)]
}
users = pd.DataFrame(users_data)

# Generate Loans Data
loans_data = []
for _ in range(num_loans):
user_id = random.randint(1, num_users)
book_id = random.randint(1, len(books))
library_id = books.loc[book_id - 1, 'library_id']
loan_date = fake.date_time_between(start_date='-30d', end_date='-1d')
due_date = loan_date + timedelta(days=14)
return_date = fake.date_time_between(start_date=due_date, end_date='now') if random.random() < 0.8 else None
loans_data.append({
'loan_id': len(loans_data) + 1,
'user_id': user_id,
'book_id': book_id,
'library_id': library_id,
'loan_date': loan_date,
'due_date': due_date,
'return_date': return_date
})
loans = pd.DataFrame(loans_data)

# Generate Holds Data
holds_data = []
for _ in range(num_holds):
user_id = random.randint(1, num_users)
book_id = random.randint(1, len(books))
library_id = books.loc[book_id - 1, 'library_id']
hold_date = fake.date_time_between(start_date='now', end_date='+7d')
expiry_date = hold_date + timedelta(days=7)
holds_data.append({
'hold_id': len(holds_data) + 1,
'user_id': user_id,
'book_id': book_id,
'library_id': library_id,
'hold_date': hold_date,
'expiry_date': expiry_date
})
holds = pd.DataFrame(holds_data)

# Save DataFrames to CSV files
libraries.to_csv('libraries.csv', index=False)
books.to_csv('books.csv', index=False)
users.to_csv('users.csv', index=False)
loans.to_csv('loans.csv', index=False)
holds.to_csv('holds.csv', index=False)

2. Input Dummy Dataset into the Database

Download all CSV file created, and then input it to the database with this query :

COPY Library FROM 'path_to_file/libraries.csv' CSV HEADER;
COPY Books FROM 'path_to_file/books.csv' CSV HEADER;
COPY Users FROM 'path_to_file/users.csv' CSV HEADER;
COPY Loans FROM 'path_to_file/loans.csv' CSV HEADER;
COPY Holds FROM 'path_to_file/holds.csv' CSV HEADER;

Retrieve Data

To learn more about this data, we can look into some questions that relate to this business :

  1. What are the top 5 most borrowed books across all libraries?
SELECT b.title, COUNT(l.loan_id) AS num_borrowed
FROM Books b
JOIN Loans l ON b.book_id = l.book_id
GROUP BY b.title
ORDER BY num_borrowed DESC
LIMIT 5;

Output :

Insights:

  • The top 5 most borrowed books are identified based on loan transaction counts.
  • These books are likely popular among library users across all libraries.

Recommendations:

  • Ensure sufficient stock of these popular books to meet demand.
  • Consider promoting similar titles or genres to capitalize on user preferences.

2. How many books are currently available in each library?

SELECT l.name AS library_name, SUM(b.available_quantity) AS total_available_books
FROM Library l
JOIN Books b ON l.library_id = b.library_id
GROUP BY l.name;

Output :

Insights:

  • Provides a breakdown of available books by library.
  • Highlights libraries with higher or lower book availability.

Recommendations:

  • Balance book distribution among libraries based on demand and usage patterns.
  • Implement inter-library transfers to optimize book availability across locations.

3. What is the average loan duration of books by category?

SELECT
b.category,
AVG((l.return_date - l.loan_date)::int) AS avg_loan_duration_days
FROM
Books b
JOIN
Loans l ON b.book_id = l.book_id
WHERE
l.return_date IS NOT NULL
GROUP BY
b.category;

Output :

Insights:

  • Indicates the average time books of each category are borrowed before being returned.
  • Shows potential differences in loan durations across book genres.

Recommendations:

  • Adjust loan policies based on category-specific loan duration trends.
  • Offer extended loan periods for categories with longer average durations.

4. Which users currently have the maximum number of holds?

SELECT u.username, COUNT(h.hold_id) AS num_holds
FROM Users u
JOIN Holds h ON u.user_id = h.user_id
GROUP BY u.username
ORDER BY num_holds DESC
LIMIT 1;

Output :

Insights:

  • Identifies users with the highest number of book holds.
  • Indicates users who are actively engaging with the library’s hold system.

Recommendations:

  • Provide personalized recommendations or notifications to highly engaged users.
  • Ensure timely communication and support for users managing multiple holds.

5. How many loans have exceeded the due date across all libraries?

SELECT COUNT(loan_id) AS num_overdue_loans
FROM Loans
WHERE return_date IS NULL AND due_date < CURRENT_DATE;

Output :

Insights:

  • Quantifies the number of loans that are currently overdue across all libraries.
  • Indicates potential issues with loan return compliance.

Recommendations:

  • Implement automated reminders for overdue loans to encourage timely returns.
  • Consider adjusting loan policies or fines for late returns to improve loan management.

Conclusion

The database design for the e-library application aims to meet the project’s objectives by implementing a well-structured schema, establishing data relationships, enforcing business rules, and enabling efficient data retrieval. The implementation and population of the database involve executing SQL commands to create tables, insert data, and query information, ensuring that the database functions effectively to support the application’s functionalities.

References

--

--

fariz zulfadhli

Entrepreneur & Data Enthusiast 💰 Sharing insights on data, business, and life 📺 Life is short, strive to be better every second 💪