How Grindr Uses HOT (Heap only Tuple) Updates
With a staggering 13 million monthly active users, Grindr pulsates with a constant stream of chats, images, and even videos between our global user base. As the number of users has grown throughout the years, so has the need to update the tech infrastructure of the app to keep it running while scaling for product updates and releases. In 2021, I spearheaded the implementation of our standardized service stack at Grindr, a pivotal move that involved integrating Postgres to fortify our data engineering capabilities. Since we recently completed our full migration to Postgres in 2023 (read about that and other accomplishments here!) I thought it would be insightful to talk a bit more about some of the mechanisms that support making this our database system of choice.
There are many articles on the internet about how Postgres uses multiversion concurrency control to allow multiple transactions to concurrently access rows (tuples in postgres parlance) without requiring locks. One of the points mentioned as a drawback is how an update results in a new row being written and the old row being marked for deletion. This is often cited as a performance drawback.
However, that’s not the whole story. While true, there is a technique that can mitigate part (or sometimes most) of the impact. This technique is known as heap-only-tuple (HOT) updates.
Page Layout
In postgres a table (heap) or an index is stored as an array of pages of a fixed size (8kb by default). Each page contains items — rows in the case of table and index entries in the case of indices.
In a table page, each row has a page number and an offset in that page. Postgres calls this the ctid
. Here is an example of data on disk, with the first column being the ctid
.
It’s always possible to fetch the ctid
for a given row by referencing the special column ctid
in your query. For example, select ctid, * from people
would return the rows in the graphic above.
An index page consists of entries that map values to a ctid
. Therefore, to find rows quickly for a given value, postgres can search the index, find the containing pages and offsets and load only those pages. That’s the basic premise of why indices are fast.
Note that when a row that an index points to is fetched, the entire page is loaded into memory. This is an important point for what follows.
Regular (Non HOT) Updates
Remember that when we update a row, a new row is written. Postgres marks the old row as deleted and adds in the new row. (It keeps track of which row is the current version using an xmax hidden column but that’s out of scope for this blog).
Consider updating a row by changing the city from Chicago to Austin. The previous row was on page 0, but the new row has been written to page 3.
This is the key part — since the new row has changed, all indices pointing to this row have to also be updated. If we have an index on name
, that means postgres has to update that index as well as writing the actual row. If we have a table with many columns and many indices we may suffer from multiple IO operations for a single changed row.
HOT Updates
Do you recall that we said earlier, that when loading a page that an index points to, we load the entire page into memory? This opens up an interesting idea. If we could write the new row in the same page as the original, and have a pointer from the old row to the new row, then the index wouldn’t need to change if the indexed value did not change either. This is because postgres would be loading both the original row and the new row in the same IO page load operation. Then once in memory, it could simply find the old row and follow the pointer to the new row.
This is the basic idea behind a Heap-Only-Tuple update — so called because only the heap (table) tuples (rows) are updated, and the indices are left intact. This can massively reduce the amount of IO operations required if non-indexed fields are updated.
So if we return to the previous example — updating Chicago to Austin — and instead wrote the new row to the same page, we could simply add a pointer to the new offset.
Now the index, which still points to (0, 2) can be left untouched.
Note that if any indexed value changes, then all indexes must be updated, not just the indexes which contain changed values.
Tuning HOT updates
In order for postgres to be able to write updated rows alongside the original rows, there needs to be space in the page. If our page is entirely full then postgres would have no choice but to write to a new page.
Therefore, we should instruct postgres to leave some space in a page before starting a new page if we are designing a schema to deliberately take advantage of HOT updates. This is called the table fill factor, and can be set at any time (once applied, newly created pages will have this value set).
We can set the fill factor using the following SQL:
ALTER TABLE foo SET (fillfactor = 75);
There is of course a trade off. The lower the fill factor the more space will be available for new rows, but the overall size of your table will increase. The right values will vary depending on how frequently you expect your rows to change and how often you want vacuuming to kick in.
Examining the Data
Postgres provides a way to see tuple states and if HOT updates are being applied by querying the pg_stat_user_tables
table.
Let’s create a table, define an index and add some data:
CREATE TABLE characters (
id BIGSERIAL PRIMARY KEY,
name TEXT NOT NULL,
title TEXT NOT NULL
);
CREATE INDEX ON characters(name);
INSERT INTO characters ("name", "title")
VALUES ('Picard', 'Captain'),
('Riker', 'Commander');
Now if we query pg_stat_user_tables` we will see that n_live_tup
is set to 2 and n_dead_tup
is set to 0.
SELECT *
FROM pg_stat_user_tables
WHERE relname = 'characters';
This is as expected because we insert two rows, and have not deleted nor updated any. Now, let’s give Picard a less formal name.
UPDATE characters
SET name='Jean Luc'
WHERE name = 'Picard';
If we now query pg_stat_user_tables
again, we will see that n_dead_tup
is now 1 (the original row was marked as deleted), n_live_tup
is still 2 (because we have two visible rows still), and n_tup_upd
is 1 (because we updated 1 row).
The column n_tup_hot_upd
is 0 because this update required us to update the indexed value.
Now, let’s give Riker a promotion.
UPDATE characters
SET title='Captain'
WHERE name = 'Riker';
Querying pg_stat_user_tables
a final time, we will see that n_dead_tup
is now 2 (two rows have been marked as deleted), n_live_tup
is still 2 (still two visible rows), n_tup_upd
is 2 (because we have now updated 2 rows), and finally n_tup_hot_upd
is 1 because this update required no index changes.
Wrap-Up
As a practical example of where we use HOT updates at Grindr, they are used extensively in our next-generation chat-platform. For example, we have a table that contains metadata for each conversation between two profiles. In this table we store sender, recipient, last_timestamp and unread_message_count among other fields.
Each time a message is sent, the unread_message_count is incremented, and each time a conversation is viewed, it is reset to zero. This is a lot of write activity as conversations are happening constantly. By not indexing last_timestamp and unread_message_count, we never have to update the indexes. Therefore, once a conversation row is created, the indexes are locked in.
This gives us super fast inserts, whilst still allowing us to query tens of billions of rows to bring back a given user’s inbox.
Reminder — we are hiring! If you are interested in working with a team of brilliant Data Engineers in our Chicago office, check out our open roles here.