Exploring A Hospital’s Database with SQL

10 min readFeb 3, 2024

The primary objective of this project is to manage the data of a hospital database, specifically focusing on the Nursing Foundation’s database.

The dataframe contains essential information regarding doctors, hospital departments, patients, appointments, medical records of each patient, and prescribed medications.

ER- Model design

Firstly, to ensure effective data management, we design an Entity-Relationship Model (ER Model).

This graphical representation provides a clear visualization of the data and the structure of our database, including primary keys (PK), foreign keys (FK), and associations between entities and their properties. To create this model, we utilized the Miro program, which offers specialized tools for table creation.

The ER Model for our database includes the following entities: “Appointments,” “Departments,” “Doctor,” “Drug,” “Medical_Folder,” and “Patient.” In the diagram, we have added the properties of each entity and their respective associations and connections. The primary keys are underlined and marked with “PK” to indicate the unique identifier for each base.

The diagram connections in the Entity-Relationship Model (ER Model) are essential for visualizing the relationships between different entities in the database. Each box in the diagram represents an entity, such as ‘Drug,’ ‘Patient,’ ‘Departments,’ etc.

The lines connecting these boxes depict the associations between the entities. Lines with an arrow at the end, indicate a ‘one’ side relationship, suggesting that one entity is associated with only one instance of the other entity. On the other hand, lines without arrows indicate a ‘many’ side, showing that one entity can be associated with multiple instances of the other entity.

For example, the relationship between ‘Drug’ and ‘Patient’ is depicted as ‘many to one,’ signifying that a patient may require multiple drugs. The arrow on one side of the line also denotes ‘partial participation,’ implying that not every ‘Drug’ entity may have a corresponding ‘Patient’ entity. Similarly, the connections for ‘Departments’ and ‘Doctors’ show a double line, indicating ‘total participation.’ This means that every department must have at least one doctor associated with it for it to be considered a valid department. The relationship between ‘Medical_Folder’ and ‘Patient’ also exhibits ‘total participation,’ implying that every patient must have a medical file associated with them.

Table creation in PostgreSQL

The next steps involve creating the necessary tables and inserting the data into them. For this project, the PostgreSQL DBMS (Database Management System) was utilized.

The initial step involved creating a database named “Hospital Management System.” Subsequently, tables were created within this database using the code shown in the below box.

CREATE TABLE Patient
 (patientAMKA BIGINT,
    userid VARCHAR(100),
    password VARCHAR(100),
    name VARCHAR(50),
    surname VARCHAR(50),
    gender VARCHAR(50),
 PRIMARY KEY (patientAMKA)
 );

CREATE TABLE Doctor
    (doctorAMKA BIGINT,
    username VARCHAR(100),
    password VARCHAR(100),
    name VARCHAR(50),
    surname VARCHAR(50),
    specialty INTEGER,
 PRIMARY KEY (doctorAMKA)
 );

CREATE TABLE Appointments
    (id INTEGER,
    t VARCHAR(150),
 patientAMKA BIGINT,
 doctorAMKA BIGINT,
    diagnosis VARCHAR(250),
 PRIMARY KEY (id),
 FOREIGN KEY (patientAMKA) REFERENCES Patient(patientAMKA),
 FOREIGN KEY (doctorAMKA) REFERENCES Doctor(doctorAMKA)
 );

CREATE TABLE Departments
    (id INTEGER,
    name VARCHAR(50),
 PRIMARY KEY (id)
 );

CREATE TABLE Drugs
    (id INTEGER,
    name VARCHAR(600),
    description VARCHAR(500),
 PRIMARY KEY (id)
 );

CREATE TABLE Medical_Folder
    (id INTEGER,
    patient BIGINT,
    cure VARCHAR(250),
 drug_id INTEGER,
 PRIMARY KEY (id),
 FOREIGN KEY (patient) REFERENCES Patient(patientAMKA)
 );

Each table in the database was designed to represent specific entities or data types required for the Hospital Management System. Once the tables were created, the relevant data was inserted into them to populate the database with essential information.

Throughout the process, queries were executed to ensure the proper creation of tables and successful data insertion.

As depicted in the screenshot above, the database has been meticulously created using appropriate CREATE TABLE commands. These commands facilitated the establishment of tables, including their properties, primary keys, and foreign keys.

For optimal data organization, columns have been designated as BIGINT for handling large amounts of data, VARCHAR for character combinations, and INTEGER for integer values, all in accordance with the data from the .csv file.

Relational Schema creation

At this stage of the project, the relational schema of the database has been successfully created.

The relational schema illustrates the connections between the Foreign Keys within the database, which greatly simplifies the ongoing work. You can refer to the image below, to visualize the relational schema and understand the relationships between different tables and their respective Foreign Keys.

The relational schema serves as a crucial roadmap for maintaining data integrity and ensuring that the database operates smoothly. By establishing these relationships, the database can effectively handle data interactions and queries

Inserting data to the tables

The next step in the project involves importing data into the Database from the .csv files using the COPY commands.

COPY Doctor(doctorAMKA, username, password, name, surname, specialty) 
FROM 'A:\Dataset\doctor.csv' WITH (FORMAT CSV, HEADER true, DELIMITER ',');

COPY Appointments(id, t, patientAMKA, doctorAMKA, diagnosis) 
FROM 'A:\Dataset\appointments.csv' WITH (FORMAT CSV, HEADER true, DELIMITER ',');

COPY Departments(id, name) 
FROM 'A:\Dataset\departments.csv' WITH (FORMAT CSV, HEADER true, DELIMITER ',');

COPY drugs(id, name, description) 
FROM 'A:\Dataset\drugs.csv' WITH (FORMAT CSV, HEADER true, DELIMITER ',');

COPY Medical_Folder(id, patient, cure, drug_id) 
FROM 'A:\Dataset\medical_folder.csv' WITH (FORMAT CSV, HEADER true, DELIMITER ',');

Initially, the patient data was inserted, followed by the insertion of data into all the other tables.

COPY Patient(patientAMKA, userid, password, name, surname, gender) 
FROM 'A:\Dataset\patient.csv' WITH (FORMAT CSV, HEADER true, DELIMITER ',');

Now, the tables have been successfully loaded with the appropriate data. This data import process is essential for populating the database with relevant information, allowing the Hospital Management System to function effectively.

Finding all of the drugs prescribed

To identify all prescribed drugs and present the quantity and name of each drug, a series of database operations were employed. The objective was to find drugs that have been prescribed, meaning they must be listed in the medical files of at least one patient.

To achieve this, the “medical_folder” table was cross-referenced with the “drugs” table since the former contains drug_id while the drug names are only available in the “drugs” table.

SELECT drugs.name, COUNT(medical_folder.drug_id) as Arithmos_Farmakwn
FROM medical_folder
JOIN drugs ON medical_folder.drug_id = drugs.id
GROUP BY drugs.name, medical_folder.drug_id;

The COUNT command was utilized to measure the quantities of prescribed drugs, and the JOIN operation was employed to associate the drug_id of the “medical_folder” table with the corresponding id in the “drugs” table.

Finally, the GROUP BY command organized the result based on drugs.name and medical_folder.drug.id, ensuring that lines with the same name and drug ID were consolidated into a cell along with the quantity they appeared in “Arithmos_Farmakwn” (Number_of_Drugs). The results are displayed in the accompanying screenshot.

The outcome showcases the names of 999 drugs and the number of times each drug has been prescribed.

Displaying the diagnoses of the last week

To display the diagnoses of the database’s last week (April 23 to May 1, 2017), the resulting output is presented below.

SELECT diagnosis, t
FROM appointments 
WHERE t >= '2017-04-23 23:59' AND t <= '2017-05-01 23:59' AND diagnosis IS NOT NULL
GROUP BY (t, diagnosis);

The query specifically filters diagnoses that are not empty (i.e., not null) from the “appointments” table within the date range of April 23, 2017, 23:59 to May 1, 2017, 23:59.

This period represents the last week of data in our database. Consequently, the query shows 827 diagnoses alongside the corresponding dates they were recorded.

Finding appointments of a doctor

The provided explanation highlights the use of the specific AMKA (Greek identification number) of Doctor “Carolyn Cook” (AMKA: 372301064587696) in the code to find her appointments.

SELECT appointments.id, doctor.name, doctor.surname, appointments.t
FROM appointments
JOIN doctor ON appointments.doctoramka = doctor.doctoramka
WHERE appointments.t BETWEEN '2017-04-01 0:00' AND '2017-05-01 23:59' AND doctor.doctoramka = '372301064587696' /* Giatros "X" */

To obtain the appointments of another doctor, let’s say “Doctor X,” you would replace “X” with the corresponding AMKA of that doctor. The code retrieves information such as appointment id, the doctor’s first and last name, and the date of the appointment.

To find the appointments for “Doctor X,” you would update the WHERE command in the code to include the specific AMKA of the desired doctor and also specify the date range, in this case, between April 1st and May 1st, 2017.

The code will then return rows that meet these conditions, displaying the appointments of “Doctor X” during that particular period.

The output, as shown in the accompanying screenshot, will demonstrate the number of appointments made by “Doctor X” during that specified month.The provided explanation highlights the use of the specific AMKA (Greek identification number) of Doctor “Carolyn Cook” (AMKA: 372301064587696) in the code to find her appointments.

The code will then return rows that meet these conditions, displaying the appointments of “Doctor X” during that particular period.

The output, as shown in the accompanying screenshot, will demonstrate the number of appointments made by “Doctor X” during that specified month.

Finding patients that have been examined by more than one doctors

To identify patients who have been examined by multiple doctors, we employ the SELECT command to retrieve the patients’ AMKAs from the ‘appointments’ table.

SELECT patientAMKA
FROM Appointments
GROUP BY patientAMKA
HAVING COUNT(DISTINCT doctorAMKA) > 1;

Subsequently, we categorize the results based on the patients’ AMKAs using the GROUP BY clause. By utilizing the HAVING COUNT clause, we count the occurrences of doctors’ AMKAs that exceed one for each patient’s AMKA. The resulting query output consists of a table containing 1000 results.

Which Department had the most patients?

To identify the department with the most patients, we employ the COUNT command to count the number of patients, as evident from the variable name.

Our query is based on the ‘Appointments’ table. By using JOIN commands, we associate the doctors’ AMKAs from the ‘Appointments’ table with the corresponding AMKAs in the ‘Doctor’ table.

SELECT Departments.Name, COUNT(*) as "Arithmos Asthenwn"
FROM Appointments
JOIN Doctor ON Appointments.doctorAMKA = Doctor.doctorAMKA
JOIN Departments ON Doctor.Specialty = Departments.id
GROUP BY Departments.Name
ORDER BY COUNT(*) DESC
LIMIT 1;

Additionally, we link the doctors’ specialties (Specialty) with their respective departments (departments.id).

Categorization is achieved using the GROUP command, organizing the results based on department names. To present the department with the highest patient count first, we utilize the ORDER BY … DESC command, sorting the result lines in descending order.

Ultimately, the LIMIT 1 statement limits the output to a single result, displaying the department with the highest patient count, which in this case is Neurology with 1453 patients.

Average number of patients per department

To determine the average number of patients with a specific diagnosis per department, we utilize the SELECT command to retrieve the specialty of doctors, designated as ‘Department’ when representing the nursing department.

SELECT Doctor.Specialty as Department, AVG(COUNT(Appointments.Diagnosis)) OVER (PARTITION BY Doctor.Specialty) AS Mesos_Oros
FROM Appointments
JOIN Doctor ON Appointments.DoctorAMKA = Doctor.DoctorAMKA
WHERE Appointments.Diagnosis IS NOT NULL
GROUP BY Doctor.Specialty;

Using the AVG command, we calculate the average number of diagnoses in each department, measured by COUNT, and designate it as ‘Mesos_Oros’ (Mean).The OVER command, in conjunction with the PARTITION BY clause, allows us to calculate separate sets of rows from the table, partitioned based on the ‘Doctor.specialty’ column rather than considering the entire table as a single set.To associate the ‘DoctorAMKA’ data from the ‘Appointments’ table with the corresponding data in the ‘Doctor’ table, we use the JOIN command.

Finally, the WHERE command filters only non-empty diagnoses (IS NOT NULL) from the ‘Appointments’ table. The results are then categorized using the GROUP BY command, based on the specialty of doctors, to ensure distinct results for each department.

From the results obtained in our database .csv file, we observe that segments 14 and 6 have the highest values, averaging 914 patient diagnoses. These segments correspond to the “Gynecology” and “Neurology” departments, respectively.

It is worth noting that the Neurology department having high values was somewhat anticipated, as per the information from question 2E, it was revealed to have the highest number of patient examinations.

In summary, our analysis indicates that the “Gynecology” and “Neurology” departments have the highest average number of patient diagnoses, with the latter department already having the most patient examinations overall.