Internal of UPDATE in Postgres

Vishal Sharma
3 min readAug 20, 2023

--

As the first step in learning about database we learn about various SQL commands like SELECT, INSERT, UPDATE and DELETE. These commands are used majorly in any applications. But when we are beginner we don’t give much thought about what actually happens internally when these commands are ran by the engine.

Here in this article, I’ll explain the internals of UPDATE command in Postgres database engine and we will learn about the complexity.

Now you will be thinking what is special in UPDATE, the special part is - in case of MySQL we update the same old row with the new data but in case of PostgreSQL we create a new row.

Yes, you heard it right. The UPDATE command in Postgres is basically an INSERT + DELETE command in Postgres. In MySQL it will have an exclusive lock on the row and then update it.

Now the question is why does Postgres does this?

Postgres works on MVCC model (Multi Version Concurrency Control). Now in this model there can be multiple version of single object i.e. there can be multiple version of a single row.

Please read the above line one more time because that will help you understand the difference between two interchangeable and confusion term — row and tuple. Row is a term to state the row-data physically and the multiple version of this row is called tuples.

So, in an update command postgres engine marks the old row as dead and creates a whole new row. Now these two rows are tuples i.e. one is the new version of the row and the other is deprecated or older version of the row.

Now there is another question, why do Postgres keeps multiple version of the rows? Why there is a need to create a new row rather than updating the older one?

The answer is the architecture MVCC. See in relation databases we have a concept called ACID (Atomicity, Consistency, Isolation and Durability). Making multiple rows is for keeping Isolation among the various Transactions going on in the database.

In Postgres, each transaction has a transactionId which is somewhat related to the time and each row has two hidden rows by default xmin and xmax , these two rows save the transactionId of the transaction which created and updated them respectively.

So, let’s suppose a transaction T1 started at time t1. So all the rows should be visible in the same state to the transaction T1 during the course of the transaction irrespective of any other transaction in the database. Now at time t2 (where t2 > t1) another transaction T2 started. This transaction update an older row but transaction T1 is not yet completed. If the transaction T2 update the older row directly then isolation won’t be maintained for Transaction T1. Hence Postgres marks the old row as dead but does not remove it so that other transactions can access the older data and creates a new row.

Now you will be wondering that there will lot of dead tuples created during the course of time but don’t worry postgres makers have thought about this too. There is an autovacuum process which keeps on running and remove these dead tuples once all the transactions which were started before the update of this row has finished.

I hope you would have learn something new about Postgres through this article. Thank you for your time :)

--

--

Vishal Sharma

Software Engineer @ Juspay | Ex-SDE Intern @ PINDEXX and GoFloaters | B Tech @ IIIT T