It is about time I started reviewing my SQL and database knowledge. I started off by reviewing SQL queries as simple as selecting rows from a table, but as I was going through these queries, I felt like I was not touching upon everything important to relational databases. In the back of my head, I was thinking there are relationships, 1NF, 2NF, etc. … But I wasn’t sure until I came across a database schema problem today. Doh! I forgot what it takes to create a database schema!
What is a relational database? A relational database is defined as a database structured to recognize relations among stored items of information according to Google search. You can represent data of all sorts through a relational database, such as a grocery store’s inventory to a realtor company and their houses.
The way a relational database works is by storing information in tables, where each table has its own rows and columns. A row represents a record (or tuple), while a column represents a field (or attribute). I will provide an example of a basic table.
Now, after seeing this example, you might think a database looks like a spreadsheet. However, what makes a database relational is the fact that there are relationships between the tables. This enables a relational database to efficiently store large amounts of data, while retrieving selected data.
Now, what is SQL?
SQL (pronounced either as “sequel” or “ess que el”) stands for Structured Query Language. This language was developed to work with relational databases. There are many popular adaptations of SQL that are specific to certain relational database management systems (RDBMS), such as PostgreSQL, MySQL or mSQL. We will just be using standard SQL in this article.
Now let’s get to Relational Database Schema Design!
What is essential to a good database design?
- Reduces redundancy: divide information carefully among tables to eliminate data redundancy. Duplicate data wastes space and can lead to inconsistency.
- Provides access with information a user needs to join tables together. (Use good primary keys, while creating proper relationships between tables.)
- Ensures data accuracy and integrity.
- Accommodates your data processing and reporting needs.
You are the designer — make sure you make the right decisions by using these guidelines to create the best relational database you can.
- Define the purpose of the database — You want to find and gather requirements for this database. Determine its objective. You can help determine what kind of information you need to record into the database by sampling queries and what results you want from the query. From here, gather the information you need, divide them into individual tables. Determine what the columns (fields) of each table will be. Remember, do not repeat information.
- Figure out the Primary Key for each table — A primary key is a field that uniquely identifies a row. A primary key can be multiple fields if necessary, in which this is known as a composite key, while a primary key using only one field is known as a simple key. For the grocery list above, the Primary Key is the column grocery_id. I chose grocery_id as the primary key as it is unique to every grocery item, is of integer type, and is simple. In light of my decision, please remember these guidelines for choosing a primary key:
- Make sure each row has a unique key — you cannot have duplicate values for the primary key of separate rows.
- Must have a value! Can never be null.
- Keep it simple if you can.
- Use an integer type primary key.
3. Determine the relationships among the tables — if you only have one table, it is pretty useless to have a database — consider a spreadsheet instead. Now, if you have multiple tables, you must determine the relationship between tables. There are many types of relationships between tables: one-to-one, one-to-many, many-to-many. We will briefly describe these relationships below, with examples.
- One-to-one: In a one-to-one relationship, only one row of a table is linked to at most one row on the other table. In this case, a one-to-one relationship is useful to store data that is optional. For example, a grocery list item can map to a grocery item details table that provides more information on the item. If the grocery item does not have this data available, there won’t be an entry for that grocery item. Check out the tables below as an example. Both tables can be joined with their common column grocery_id, which happen to be the primary key of both tables.
- One-to-Many: In a one-to-many relationship, one row of one table can link to many rows in a table. A great example of a one-to-many relationship is an AuthorsList and BooksList. An author can have many books written, and in this case let’s keep it simple by imagining each book having only one author. The advantage of a one-to-many relationship is to store the frequently used information in the parent table (the table with “one”), while referencing this information many times in the child table (the table with “many”). In the child table, there should be a foreign key, which is the parent table’s primary key, that is used to reference back to the parent table. In the example below, the foreign key is the author_id.
- Many-to-Many: In a many-to-many relationship, one or more rows of one table can link to 0, 1 or many rows in the other table. To implement this relationship, we must use a mapping or intermediary or junction table.
We will use a grocery store as an example. In a grocery store, you have a list of products (grocery list of the store) called Products with primary key product_id, and each customer has their own order (or transaction) full of groceries, called Transactions with primary key transaction_id. A grocery can be purchased by 0, 1 or more customers, and a customer can purchase 0, 1 or more groceries. To map this, we use a junction table named ProductTransaction using both primary keys product_id and transaction_id to map a transaction (or customer) to a product, and map a product to a transaction.
4. Refine the Design — Ask yourself these questions: do you have enough columns to represent your data? Could you derive a column from a combination of columns? Are you entering duplicate information? Do you have any empty fields in individual records? Can a large table be split into two? There are more questions you may ask about your database schema, but to achieve a refined design, you must normalize your tables.
- First Normal Form (1NF): A table follows first normal form if for every row and column intersection (a cell in a table) in the table, there exists a single value and never a list of values. This is known as the atomic rule. Use the one-to-many relationship to follow 1NF.
- Second Normal Form (2NF): A table follows second normal form if it is 1NF and every non-key column is fully dependent on the primary key. This also applies when a primary key has multiple columns, every non-key column should still depend on the entire set and not part of it.
This example does not follow 2NF:
- Order ID (PK)
- Product ID (PK)
- Product Name
Since Product Name depends on Product ID but not Order ID.
- Third Normal Form (3NF): A table is 3NF if it is 2NF and the non-key columns are independent of each other.
This example does not follow 3NF:
- Grocery ID (PK)
Since Discount is dependent on Price, another non-key column.
Lastly, there are integrity rules to ensure integrity of your information:
- Entity Integrity Rule: The primary key cannot contain NULL! Otherwise, it cannot uniquely identify the row. This includes for multiple column’d primary keys as well, none of the columns can be NULL.
- Referential Integrity Rule: Each foreign key value must be matched to a primary key value in the table referenced (or parent table). In turn, this means that you can only insert a row with a foreign key in the child table only if that value (of the foreign key) exists in the parent table. If the value of the key changes in the parent table, all rows in the child table(s) should also change the value of the foreign key referring to the parent table. This can be handled by either not allowing change, cascading the change, or set the key value in the child tables to NULL.
- Business logic integrity: These are validation rules to ensure for correct values (5 digit zip codes, validating credit card number, etc).
There are many steps to creating a “good” database structurally. First, you need to understand the needs of your database. What information do you need to store? How can you split this information up into separate tables? What are the primary keys for each table? These are very important questions to ask before jumping into creating a database.
You use the information above to create your tables. The tables then need to form relationships with one another. If there are many-to-many relationships, you will need a junction table. Lastly, after establishing relationships, you need to normalize your tables. Make sure that all tables are 3NF, and follow integrity rules to ensure the safety of your data.
Voila, you now have yourself a good database schema design! Below are the materials I used to create this article.