Migrating from Oracle to Postgres? Be aware of FILLFACTOR and Indexing implications
We know how useful Indexes are, to speed up SQLs in relational databases. Though indexes add some overhead on DMLs processing because with any DML on a table, all the corresponding Indexes need to be synched with the heap table.
Postgres has become quite a popular open source database and many organizations are migrating their workload from Oracle to Postgres. One of the major differences between these two RDBMS is multi version concurrency control (MVCC) implementation. Let’s understand that.
How UPDATE query works in Oracle
In Oracle, when an user runs an UPDATE query to update a column, the old image of that row stores in global UNDO segment which is a dedicated undo space, managed separately. In this scenario because Oracle changes the row in place and hence row location (known as rowid in Oracle) does not get change, hence there is no need to change corresponding index entry (unless UPADTE query modifying the data of an Index column itself).
How UPDATE query works in PostgreSQL
In PostgreSQL, when an user runs an UPDATE query to update a column, it marks the old row as dead tuple/row and creates a new version of the row with UPDATED column data(out of place change). Here because the row’s physical location (known as ctid in Postgres) gets changed and it seems with every UPDATED row, corresponding index entries will also need to change. That looks like a huge disadvantage in Postgres if all the Index entries need to change with every UPDATED row.
Wait! Postgres developer community came up with a nice optimization in this process, called HOT (Heap Only Tuples) where even with ctid changes for a row, it does not need to change the Index entry and it puts a locator inside table block which points to the new row location within the same block. HOT also does a block level mini VACUUM at some point (when space pressure comes on the block) to clean up the dead tuples.
With HOT feature in Postgres, it is comparable with Oracle and any UPDATED row will change the Index entry only when UPDATE query is changing a column data which is part of any Index.
Next Question: What if there is no free space in the current block to accommodate the new UPDATED row?
In this scenario Oracle and Postgres behave in different ways.
In Oracle, it updates the row in place so a block can be full only if new UPDATED row size is bigger than the old row size. In this case that row will be migrated to a new block and a pointer to the new row id will be there in the current block.
- Pros — No need to change the Index entry as Index entry will still points to the old block.
- Cons — Index scan will become slower because Index entry still points the row location to the old block and then old block will point the row location to the new block and fetching a row is then two-step process and hence slower.
In Postgres, when a new UPDATED row does not find space in the current block, it will get stored in a new block with free space. Here instead of putting a locator in the old block, Postgres chooses to change all the corresponding Index entries.
- Pros — There is no impact on Index scan as Index entry for the row contains the actual ctid of the row.
- Cons — Because all the Index entries need to change, that cause the following problems.
- Higher latency of UPDATE statement because every row UPDATE needs to change table entry as well as all the Index entries.
- Long running VACUUM as vacuum needs to clean index dead tuples along with the table.
- Write amplification and as a result more data to be sent to all the Standby/Read replicas.
Summary: By default Postgres table gets created with FILLFACTOR 100 and due to its above explained MVCC architecture, with UPDATE statement the new row has a more tendency to go to new block resulting in all the INDEX updates as well. To prevent these INDEX updates, we need to reduce the FILLFACTOR value for a table which gets high UPDATEs and also get rid of any unwanted INDEX or INDEX column (in case any column in these INDEXes is getting UPDATED).
Let’s do a more deep dive on Postgres FILLFACTOR in the next post.