Relational Databases Technical Guide

Database Dynamics: Unveiling the Evolution, Concepts, and Optimization — Part 2

Published in

Poatek

12 min readDec 20, 2023

Welcome back, dear readers, to the captivating journey of exploring the evolution of databases. In our previous blog post, “The Birth of Databases”, which is the part 1 of our serie Database Dynamics: Unveiling the Evolution, Concepts, and Optimization, we delved into the early stages of this fundamental technology. We witnessed how databases transformed the way we store, manage, and analyze data, becoming the backbone of countless applications and systems.

Now, in this thrilling sequence, we embark on a more technical post to explore deeply some important technical concepts about Relational Databases. By understanding these concepts, you will gain the required knowledge and tools to use the technology properly using best practices. So fasten your seatbelts as we embark on this technical expedition, which will equip you with the knowledge that you need to work with these databases and allow you to use their full potential.

To be able to dive deeper into each topic and give our readers a more complete understanding about them, we decided to break this post into two parts. The first and current one will bring a discussion about relational databases, while the second one will be about non-relational databases.

Enhancing RDB Utilization

In today’s data-driven world, efficient and effective utilization of relational databases (RDBs) has become fundamental. As businesses and organizations generate vast amounts of data, the need to maximize the potential of RDBs has never been more crucial, and to achieve that it is essential to have the right resources at your disposal. From uncovering best practices and strategies to introducing must-have tools and techniques, this guide will equip you with the knowledge necessary to take advantage of the full potential of your RDB.

In Relational databases, the standard language is SQL (Structured Query Language), and as the basis of every database of this type, some important resources are common among most database management systems (DBMS), such as Microsoft SQL Server, MySQL, PostgreSQL, MariaDB, and Oracle. In the next sections, we’ll be talking about these important resources.

Indexes

Imagine you’re in a library trying to find a specific book. The library has thousands of books arranged on shelves according to their respective genres and author names. Without an index, you would have to assume they are not sorted and you would have to search each shelf and go through every book to find the one you’re looking for, which could be a time-consuming and frustrating task.

However, with the presence of a well-organized index, you can quickly locate the book you want. The library’s index could be a separate catalog that lists all the books along with their corresponding genres and authors. You pick up the index, go to the section corresponding to the genre of the book, and locate the author’s name. Under that author’s name, you find the information that points you directly to the shelf and section where the book is located. Without wasting time and effort.

Figure: Index usage example on a library shelf (Source: Authors)

In this example, the index acts as a guide that helps you navigate through the massive collection of books efficiently. It narrows down your search by providing you with the essential information required to locate the desired book accurately.

Relational database indexes work similarly to that catalog. They create a separate data structure that allows the DBMS to quickly find specific records or ranges of records on a table based on the indexed values. Instead of scanning through the entire table, the system uses the index as a reference to locate the desired data, resulting in faster query execution and improved performance.

Indexes can significantly improve query performance, especially when dealing with large datasets or complex queries. However, it’s important to note that indexes also incur some overhead during data modification operations (such as inserts, updates, and deletes) as the index structures need to be updated as well. Therefore, it’s essential to carefully evaluate and select the columns that will benefit the most from indexing while considering the trade-offs between read and write performance.

Views

A VIEW is like a virtual table that is created based on the results of a specific SQL query. It’s not an actual table with physical data, but rather a saved query that can be used like a table in other queries.

Think of it as a way to save and reuse complex queries so that you don’t have to write them each time you need the same information. With a VIEW, you can create a customized and simplified view of your data, showing only the columns and rows that you need.

Suppose you need to frequently retrieve a report that displays the total quantity of each product ordered and the total revenue generated from each product. Without using a VIEW, you might have to write a complex query each time you need this information. Here’s an example of how the query might look:

SELECT 
    products.product_name,
    SUM(order_items.quantity) AS total_quantity,
    SUM(order_items.price * order_items.quantity) AS total_revenue
FROM
    products
JOIN
    order_items ON products.product_id = order_items.product_id
GROUP BY
    products.product_name;

To simplify this complex query using a VIEW, you can create a VIEW called “product_sales_view” that encapsulates this logic. Here’s how the VIEW creation might look:

CREATE VIEW product_sales_view AS
SELECT 
    products.product_name,
    SUM(order_items.quantity) AS total_quantity,
    SUM(order_items.price * order_items.quantity) AS total_revenue
FROM
    products
JOIN
    order_items ON products.product_id = order_items.product_id
GROUP BY
    products.product_name;

Now, whenever you need to retrieve the product sales information, you can simply use the VIEW in your query:

SELECT * FROM product_sales_view;

Using a VIEW in this scenario simplifies the process of retrieving the product sales report by eliminating the need to rewrite the complex query each time. It improves code readability, maintainability, and allows for easy data access and analysis. The best part is that whenever you make changes to the underlying tables, the VIEW automatically reflects those changes. So, it’s a convenient and dynamic way to interact with your data without duplicating or rewriting queries every time.

Materialized Views (MV’s)

Materialized Views (MVs) are a variant of standard views in SQL databases. While standard views are virtual tables that are computed on-the-fly when queried, materialized views are precomputed and physically stored as real tables in the database. It’s mean that by storing the result set as a table, materialized views eliminate the need for complex computations, joins, or aggregations each time the view is queried. This precomputation saves processing time and resources.

In a nutshell, Views are pre-saved queries that you can use as if they were tables, it’s like reusable query definitions. On the other hand Materialized Views (MVs) are like “cached” tables without needing to recompute the query every time.

One important characteristic to note, and maybe is a TradeOff, is that the data stored in materialized views is not automatically updated when changes occur in the underlying tables. The materialized view needs to be refreshed or updated manually or automatically at regular intervals to synchronize it with the latest data in the database.

REFRESH MATERIALIZED VIEW  product_sales_mv;

This query above is an example of a command to refreshes the entire MV by recomputing the query and updating the data in the MV with the latest information from the underlying tables. It typically involves executing the original query and replacing the existing data with the new result set.

It’s important to consider these trade-offs when deciding whether to use Views or Materialized Views. The choice depends on factors such as query complexity, performance requirements, data freshness needs, storage constraints, and flexibility in handling data transformations or dynamic queries.

Table: Views and MV’s comparison (Source: Authors)

Procedures

Stored procedures in relational databases are like pre-defined sets of instructions that make executing commonly used or complex tasks easier. They are essentially a way to bundle multiple SQL statements together into a single unit that can be called whenever needed.

Not only allow for the execution of pre-defined sets of SQL statements but also have the ability to process and transform data. This means you can create procedures that perform calculations, apply business logic, manipulate data, and generate derived results.

Think of it this way — instead of repeating the same SQL statements over and over again, you can create a stored procedure once and then simply call it whenever necessary. This not only reduces the amount of code you need to write but also improves performance.

For example, imagine you have a database table called “Customers” with columns for “Name”, “Email”, and “Phone Number”. Instead of writing and executing the same SQL statements to retrieve all customers with a specific email, you can create a stored procedure called “GetCustomersByEmail” which takes an email as an input parameter.

Here’s how you can define and use a stored procedure in a practical sense:

Defining the stored procedure:

CREATE PROCEDURE GetCustomersByEmail 
    @Email VARCHAR(100)
AS
BEGIN
    SELECT *
    FROM Customers
    WHERE Email = @Email
END

2. Calling the stored procedure:

EXEC GetCustomersByEmail 'example@email.com'

In this example, we defined a stored procedure called “GetCustomersByEmail” that selects all records from the “Customers” table where the email matches the input value. When we call this stored procedure and provide an email address, it will execute the SQL statements within it and return the corresponding customer records.

By utilizing stored procedures, you can improve performance, enhance code organization, promote security, and allow for reusability of code. They are valuable tools within relational databases that simplify complex tasks and make database interactions more efficient. They enable you to perform complex calculations, apply business logic, and manipulate data directly within the database, contributing to improved performance and efficient data processing.

Transactions

Transactions are an essential concept that ensure the atomicity, consistency, isolation, and durability (ACID) properties of multiple database operations. They provide a way to group multiple SQL statements into a logical unit, ensuring that either all the statements within the transaction are executed successfully, or none of them are executed at all.

In simpler terms, transactions allow you to perform multiple database operations as a single, indivisible unit. This helps maintain data integrity and provides a way to handle complex database interactions reliably.

For example, consider a scenario where you need to transfer funds between two bank accounts. A transaction can be used to ensure that the debit operation from one account and the corresponding credit operation to the other account both complete successfully, or none of them happen at all.

Here’s an example showcasing the functionality of transactions:

Beginning a transaction:

BEGIN TRANSACTION

2. Performing database operations within the transaction:

UPDATE Account SET Balance = Balance - 100 WHERE AccountNumber = '12345678'
UPDATE Account SET Balance = Balance + 100 WHERE AccountNumber = '87654321'

3. Committing the transaction:

COMMIT TRANSACTION

In this example, the transaction is initiated with the BEGIN TRANSACTION statement. Within the transaction, two SQL statements are executed to update the balances of two different accounts. If both statements complete successfully, the transaction can be committed using the COMMIT TRANSACTION statement, ensuring that the changes are permanently saved in the database. However, if an error occurs during any of the statements, the transaction can be rolled back using the ROLLBACK TRANSACTION statement, undoing all the changes made within the transaction.

Transactions enable reliable and consistent database operations, even in the face of errors or failures. They provide a way to handle complex operations that involve multiple steps or interactions, ensuring that data remains in a valid state and maintaining the integrity of the database.

By using transactions effectively, you can guarantee the ACID properties of your database and handle data manipulations reliably, making them an indispensable tool in enterprise-grade relational database systems.

Triggers

Triggers are tools designed to automatically respond to certain database events, such as insertions, updates, or deletions, on specified tables. They allow you to define custom logic or actions that should be executed whenever these events occur.

Think of triggers as event-driven actions that are triggered by database operations. They provide a way to enforce data integrity, implement business rules, or automate certain processes without requiring explicit manual intervention.

For example, let’s consider a situation where you have a database table called “Orders” and you want to enforce a business rule that prevents an order with a quantity less than 1 from being inserted. You can create a trigger called “PreventNegativeQuantity” that checks the inserted or updated row and raises an error if the quantity is less than 1.

Here’s an example showcasing the functionality of triggers:

Defining the trigger:

CREATE TRIGGER PreventNegativeQuantity
ON Orders
AFTER INSERT, UPDATE
AS
BEGIN
    IF EXISTS (SELECT 1 FROM inserted WHERE Quantity < 1)
    BEGIN
        RAISERROR ('Quantity must be greater than or equal to 1.', 16, 1)
        ROLLBACK TRANSACTION
    END
END

2. Executing an operation that triggers the trigger:

INSERT INTO Orders (Product, Quantity, Price)
VALUES ('Item A', -2, 10.99)

In this case, the trigger “PreventNegativeQuantity” is defined to execute after an insert or update operation is performed on the “Orders” table. It checks if any inserted or updated rows have a quantity less than 1 and, if so, raises an error and rolls back the transaction.

Triggers provide a way to enforce complex business rules, maintain data consistency, and automate data-related actions within the database. By reacting to database events, triggers help ensure data integrity and streamline data processing workflows.

It’s important to note that triggers should be used judiciously, as they can introduce additional complexity and have an impact on database performance. However, when implemented effectively, triggers are a valuable tool to maintain data quality and enforce specific behaviors within the database.

Good Practices

In order to maximize the efficiency and maintainability of relational databases, it’s important to follow some best practices. Some key aspects to consider regarding this topic are database normalization and balancing performance.

Normalization is the process of organizing data in a database to reduce redundancy and dependency. It involves breaking down data into smaller, logical tables and establishing relationships between them. There are three six known Normal Forms (NF), but by adhering to normalization principles, and reaching the widely-used third normal form (3NF), you can ensure data integrity and optimize storage efficiency. We won’t dive deeper into these normal forms, but you can read more about them here: Normal Forms in DBMS — GeeksforGeeks.

Balancing performance with all the features needed is another crucial consideration. While normalization enhances data integrity, it can impact performance due to increased join operations. To strike a balance, it’s essential to carefully denormalize select areas of the database where performance gains are more important than normalization. This can involve creating redundant data or introducing calculated columns to eliminate the need for frequent complex queries. However, denormalization should be approached thoughtfully to avoid data redundancy and inconsistency.

Additionally, optimizing query performance plays a vital role in relational database management. This includes designing proper indexes, utilizing views and materialized views to simplify complex queries, and regularly monitoring query execution plans to identify bottlenecks and improve performance. It’s also advisable to employ query optimization techniques, such as avoiding unnecessary functions or expressions, using appropriate WHERE clauses, and leveraging query caching mechanisms. For large tables, it is also important to avoid unnecessary indexes, as they can add cost to insert/update/delete operations.

Maintaining a robust backup and recovery strategy is equally important. Regularly backing up data, testing restoration processes, and implementing disaster recovery plans are crucial steps to safeguard against data loss and maintain business continuity.

Moreover, we selected some other general tips considered as good practices to adopt in your daily work with relational databases:

Use consistent and meaningful naming conventions for tables, columns, and constraints. This will make your code more readable and maintainable.
Avoid using `SELECT *` in your queries, especially for large tables. Instead, explicitly list the columns you need. This improves performance and helps prevent unexpected errors when column definitions change.
Use indexes appropriately to improve query performance. Identify commonly used columns in your queries and create indexes on them to speed up data retrieval. But pay attention to the balance with performance, as index add a cost to insert and update operations.
Regularly review and optimize your SQL queries and database schema. Monitor query performance, identify bottlenecks, and make necessary changes to improve efficiency.
Document your SQL code, including its purpose, expected inputs, and output. This will help other developers understand and maintain your code in the future.
Perform thorough testing on your SQL code before deploying it to production. Validate the output and handle edge cases to ensure correct behavior.
Always backup your database before making structural changes (e.g., adding or modifying tables, indexes, and columns) to avoid data loss or corruption.

Remember, these practices can vary depending on the specific database system you’re using, so it’s always a good idea to consult the documentation and best practices guides provided by your database vendor.

Conclusion

Summing up, efficient use of relational databases hinges on understanding and implementing vital concepts like indexes, views, procedures, transactions, and triggers. Applying best practices is the key to improving performance, data integrity, and seamless data processing. The right balance between normalization and denormalization drives the optimal performance and data consistency.

The whole discussion underscores the significance of dynamic and effective management of relational databases. As we conclude, remember that mastering databases is an ongoing process driven by learning and experience. Up next, we move on to explore non-relational databases. Stay tuned and keep mastering the art of databasing!