Database — Indexing, Transactions & Stored Procedures (Part 9)

Optimization, Working with sensitive data, & Re-usability.

Omar Elgabry

Published in

OmarElgabry's Blog

10 min readSep 15, 2016

Wish you already came along the last part Database — Structured Query Language (SQL) (Part 8)

An index is a data structure that optimize searching and accessing the data. It’s like an index at the back of a book. When your database start to grow, the performance will be a concern. Hence, getting directly to a specific row in a large table in the least possible time is a priority.

Indexing

One of the ways that will optimize your database searching and accessing is having indexes on the columns that you usually access the table using it.

What the DBMS will do when you ask for a specific row, it will go sequentially and check with every row; “Is this the row that I need?”, If yes return it, if no, keep searching till the end.

But, we have a better way to do that. An index, as we’ve mentioned, is a data structure, it won’t be obvious for you, but it’s stored inside the DBMS, most commonly as a B- tree.

By default, Most of the DBMS automatically create an index on primary and unique columns.

How Indexes Work?

Let’s say that you have an index for a primary key. This will create an ordered list of primary key values in a separate table, each entry has a pointer points to the relative value in the original table.

So, whenever you want to access a table using the primary key, it will use binary search algorithm (takes time of O(LogN)) to access the required value in the Index table, and then, go to the relative value in the original table.

And, definitely, you can create another index on another column, even if it’s a non-primary column, like first name, assuming that you usually access the table using that column.

The decision for choosing another column (besides the primary key) to be indexed-ed can be delayed until the database has been used for a while. This is because we want to know how users are really using our database, and what kind of queries they’re running rather than how we hoped or thought.

Composite Indexes

You can also create an index on a combination of columns, meaning if you often access the table using the first name, and last name, you can create an index on both, the first name and last name.

Now, the Index table will be sorted according to the first name, and for each value of the first name it will be sorted according to the last name.

When you access the data, It’s more efficient to specify the columns in the right order as in the index definition. So, here, it should be first name, then last name, and not the vice-versa.

Clustered & Non-Clustered Indexes (Optional)

Some DBMS has different implementation of indexes. They use the idea of clustered & non-clustered index.

— Clustered Index

Every table can have one and only one clustered index. The most common clustered index in any database table is the primary key column.

The database will then order the data in the table based on the clustered index; no Index table need to be created. The binary search algorithm will be used to get to the required data in the table (already ordered).

— Non-Clustered Indexes

If you found yourself often also accessing the data using another column, we can create a secondary index; a non-clustered index.

Let’s say we want to create a secondary index for the last name, while the clustered index is the employee id. It’s created in a separate table, it has two columns, one for the last name, and one for the corresponding employee id.

The created table is now sorted by last name, the way that we can’t actually do in the employee table, because we’re already sorted by the employee id.

Now it’s not as quick as using the clustered index. Why? We still need to read from the table created for secondary index then jump to the employee table to get to a specific employee. But, it’s much quicker than a full table scan.

A table can have only one clustered index, while it can have more than one non-clustered index.

Indexes Pitfalls

Indexes are a great way to optimize the performance, but you don’t have to put index on every column.

Yes, it’s much quicker than the sequential search, but, they must be maintained. If your column changes a lot in your original table, keep in mind that every update, or insert, or even delete has to reflect back on the Index table. In addition, since every index is a new table, it requires additional space.

So, the moral of the story, indexes will improve the performance, use them on the columns that you will access a lot.

Implementation

As we mentioned previously in the SQL tutorial, DDL operations manage table and index structure. That is, we can CREATE, ALTER, or DROP an index.

The syntax to create and delete indexes varies among different databases. Therefore, you need to check the syntax for creating indexes in your database. We’ll use MySQL to get an idea on how they can be created, and deleted.

— CREATE

There are two ways to create an index, either when creating a table, or using the CREATE INDEX statement.

-- Create an index in CREATE TABLE (duplicate values are allowed)
CREATE TABLE books (
    title VARCHAR(255),
    INDEX index_name (title)
);-- Create an index in CREATE TABLE (duplicate values aren't allowed)
CREATE TABLE books (
    title VARCHAR(255),
    UNIQUE INDEX index_name (title)
);-- Create an index (duplicate values are allowed)
CREATE INDEX index_name ON table_name (column_name);-- Create a unique index (duplicate values aren't allowed)
CREATE UNIQUE INDEX index_name ON table_name (column_name);

The CREATE INDEX statement is mapped to an ALTER TABLE statement to create indexes (see ALTER).

If you want to create an index on a combination of columns, you can list the column names within the parentheses, separated by commas.

CREATE INDEX index_name ON table_name (column_A, column_B, ...);

The index table will be sorted on column A, then with A, there would be the B’s in order as well. So, It’s more efficient to specify the columns in the order as in the index definition to speed up queries on the table.

— ALTER

The ALTER TABLE statement changes the structure of a table. One example is to create or destroy indexes.

-- Add an index to existing table
ALTER TABLE tbl_name ADD INDEX index_name (col_name);-- Add a unique (no duplicates) index to existing table
ALTER TABLE tbl_name ADD UNIQUE INDEX index_name (col_name);-- Delete an index in an existing table
ALTER TABLE tbl_name DROP INDEX index_name;

— DROP

The DROP INDEX statement deletes an index a table.

DROP INDEX index_name ON tbl_name;

The DROP INDEX statement is mapped to an ALTER TABLE statement to drop the index (see ALTER).

Transactions are very important when working with sensitive data and you want your data to be very accurate like transactions in banks, or booking a flight ticket systems.

Transactions

A transaction is a combination or a set of queries. There are 4 properties for any transaction that must be fulfilled, grouped together under the acronym ACID. So, a transaction must be Atomic, Consistent, Isolated, and Durable.

Atomic

It’ says that a transaction either gonna successfully executed or not at all, and so the database has to revert back to the original state in case of failure.

So, if a transaction has 20 queries, and you executed them all successfully except the last one, the database has to revert back to the original state; the state before executing the transaction.

This is because a transaction is a one unit, either successfully executed or not at all.

Consistent

A transaction must take the database from one valid state to another valid state. So, for example if you have a foreign key that references a primary key, and you accidentally deleted a primary key, this will violate the integrity constrains, because a foreign key can’t reference a value that doesn’t exist.

So, even if a transaction is successfully atomic, it still can’t result in a situation that violates any of the integrity rules defined in a database.

Isolation

Consider that we have two persons John and Peter, and they accessing the same bank account at the same time from different places. So, the first transaction by John will start by selecting the balance of account number 2100, and so the second by Peter.

And, because they want to withdraw $100, they will decrement 100$ from 1000$, and the balance will be updated to 900$. Now, the balance is 900$, although we withdraw 200$ (100$ for John, and 100$ for Peter). Obviously, this is not a good situation for the bank.

So, Isolation will isolate the data, and transactions won’t interfere with each other. And, if a transaction is trying to change a row, this row won’t be available for any other transaction unless the first transaction finishes.

Now, concurrency control in transactions is one of the areas where it’s implemented differently across database management systems, although the principles are the same. So, take a look at your chosen database management system when it comes time to do this.

Durable

Durability refers to the transaction being considered robust. It means if a transaction is successfully committed, then the transaction is guaranteed. So, if we changed the balance successfully, then the new balance will remain, even if the system crashes later or there was a power failure.

Many DBMSs write transactions into a transaction log. This will maintain the last state of the system.

Implementation

Transactions wrap DML operations. They are implemented differently across DBMS, although the principles are the same. So, check the options that you have in your DBMS, and how to navigate between them.

To give an overall idea of how they can work, we’ll use MySQL as an example. The format as the following:

We start a transaction by START TRANSACTION statement, then write our DML statements, finally, we end the transaction with COMMIT or ROLLBACK.

START TRANSACTION;
  INSERT INTO books (title, author, price) 
    VALUES ('Database Systems', 'Christopher', 300);
  UPDATE books SET price = 100 WHERE title LIKE '%Database%';
COMMIT;

A COMMIT means that the changes made in the transaction are permanent. A ROLLBACK statement, on the other hand, cancels all modifications made by the current transaction.

As you start to do more complex queries, you want to keep and reuse them. You can do this by creating a stored procedures.

Stored Procedures

We’ve also mentioned previously in the SQL tutorial that DDL operations manage stored procedures; we can CREATE, ALTER, or DROP a procedure.

A stored procedure is a block of SQL that can be written, given a name, and stored directly in the database, kind of like creating a function or a method in other programming languages, and then it can be executed multiple times.

Implementation

CREATE PROCEDURE FinanceEmployees() 
    SELECT * FROM Employee
    WHERE Department = 'Finance'
    ORDER BY LastName
;

We can call the procedure using the CALL keyword. That’s going to execute it and return the results as if we just typed the full SQL query.

CALL FinanceEmployees();

Instead of returning the employees in the finance department every time, we can send the department name as a parameters. Then, we call the stored procedure passing in whatever parameter we want.

CREATE PROCEDURE GetEmployees(IN dept VARCHAR(45)) 
    SELECT * FROM Employee
    WHERE Department = dept
    ORDER BY LastName
;CALL GetEmployees('Finance');

To delete a procedure, just use the DROP PROCEDURE statement.

DROP PROCEDURE IF EXISTS GetEmployees;

This is another area where the actual creation of a stored procedure is a little different across the DBMS, although there are core similarities.

SQL Injections

Using a stored procedure is also a level of protection against something called “SQL Injection Attack”.

SQL Injection is a technique where malicious users can intentionally inject SQL commands, usually, via web page input, which may result in revealing your database records, or modifying them, or even worse, deleting the entire table.

sql = "SELECT * FROM users WHERE name = '" + userName + "';"
Execute(sql);

What if a user came into your application and entered a'; DROP TABLE users WHERE '1' = '1 in the username filed?. Definitely, you don’t want to fall in this trap. A stored procedure instead uses parameters; no need to (and you shouldn’t) concatenate strings.

So, even if there are some other ways to avoid, or even totally prevent SQL injections, stored procedures are one way around them.

Stored Procedure Vs Stored Function

We’ve already introduced the stored function. So, you might be asking about the difference between them. Generally, here’s a list of the main differences.

Procedure allows SELECT as well as DML statement in it whereas function allows only SELECT statement in it.

The moral of the story use function to perform quick tasks that will be used inside SQL statements, and procedures for big tasks that’s independent on any context.

Wrapping Up

You should now work on optimizing your database by creating indexes on your columns you access a lot.

And if you are working with sensitive data, and you want to make use of transaction benefits, then wrap your queries with transaction statements.

Along the road, stored procedures can help you to wrap complex queries so you can keep and reuse them.

These areas are implemented differently across different DBMS. We didn’t dig deeper into SQL queries. We’ve just demonstrated the main principles. So, look closely to your chosen DBMS to see how it’s implemented.

Database — Indexing, Transactions & Stored Procedures (Part 9)

Optimization, Working with sensitive data, & Re-usability.

Indexing

How Indexes Work?

Composite Indexes

Clustered & Non-Clustered Indexes (Optional)

— Clustered Index

— Non-Clustered Indexes

Indexes Pitfalls

Implementation

— CREATE

— ALTER

— DROP

Transactions

Atomic

Consistent

Isolation

Durable

Implementation

Stored Procedures

Implementation

SQL Injections

Stored Procedure Vs Stored Function

Wrapping Up

Written by Omar Elgabry