This is a continued part of This Part One writing about Relational Databases Internals.
To start with how ACID is Implemented in SQL databases specially PostgreSQL, Let’s first define what is the database transaction?.
What is Database Transaction ?
- Is a sequence of one or more SQL Statement/s that is/are treated as a one unit.
- Starts with BEGIN and ends with COMMIT and it has a ROLLBACK.
- BEGIN to inform the database to start the transaction.
- COMMIT is just to execute/write the transaction into the Data File “File System”
- ROLLBACK for Rolling back the changes done in the COMMIT level.
- There is a file called “pg_log” which is used for marking any transaction as committed or aborted , This is useful later on this article.
- Ex:
BEGIN;
<SQL Statement/s …..>
COMMIT;
Important Questions :)
1- what will happen if two “or more” transactions are executed against the same data in a table at the same time ?
2- Is there any way to isolate transactions between each others in order to avoid any conflict in terms of reading and updating the same exact data at the same time ?
3- What will happen if the Database itself became down while a specific transaction did not completely finish its execution and it is just done with a part of a SQL statement, like insert or something similar?
4- How can we make sure that the data will become the same as the last save time once database is back from a down time ?
and more and more , Basically what makes sure that the database concurrency and possible failure will not affect data ? :)
Yeah and here we come to ACID and its ways to try to avoid any conflict and successful recover from any possible failure.
What is ACID ?
- It is an abbreviation taken from “Atomicity , Consistency, Isolation and Durability”.
1- Atomicity:
- A mechanism which allows the database to rollback all queries/statements within a failed transaction and clean all garbage that has been there.
- which means , making the whole transaction fail or succeed as a unit with no any partial fail or success.
2- Consistency:
- The data should be the same during any update.
- Ex: a banking app that will transfer an amount from account to other account, The total amount should be the same between the sender and receiver till the end of the transaction/
- Consistency means a balanced approach before and after the transaction is done.
3- Isolation:
- Database transactions should be totally isolated from other transactions that are executed at the same time.
- serializable means to allow multiple transactions which does not affect each others only.
- There are a lot of challenges in this point which we will come to it later here.
4- Durability:
- Means that if the database crashes and up again , the system should be restored with its proper successful state.
- The database should be up to date all the time.
Isolation is not free of charge :)
Isolated transactions means transactions should be “serializable” and executed one by one, BUT this does not mean blocking any other transaction until the first one ends, It affects the concurrent transactions to the same data only, This is perfect but it affects the performance for sure in case we have very large number of concurrently running transactions.
But how can we improve the cost of performance as a result of Isolation ?
Here we come to Isolation Levels :
- We will have to know about some issues that can happen based on the level of Isolation which are like:
1- Dirty Read => A transaction tries to read a data written by other transaction but NOT committed yet.
2- Non-Repeatable Read => When we have a read transaction which takes long time and other update/committed transaction done before the read one ends.
3- Phantom Read => When we read a committed value that is not being showed up in a where condition like (range of time query)
4- Serialization Anomaly => The result of successfully committing a group of transactions is inconsistent with all possible orderings of running those transactions one at a time.
What are the Isolation Levels ?
1- Read Committed:
- Means to read ONLY committed changes done by any other transactions.
- In this level , we may deal with (Non-Repeatable Read , Phantom Read)
2- Read Uncommitted:
- Means reading uncommitted changes as well, In this case we will deal with all possible issues above :)
3- Repeatable Read:
- Means that the database will make sure that when reading any data , the data will remain unchanged while the read transaction is running,
- We may deal with ONLY Phantom Read issue.
4- Serialized Level:
- Transactions are done one by one in order which will avoid all the above issues but with low performance.