Scaling PostgreSQL: Check your Vacuum!
In Gett, we’ve developed a service that relies heavily on inserts. We are talking about approximately 5,000 requests per minute where each row contains about 40KB of raw data, which adds up to a whopping 30GB of data inserted daily.
As our relational insertion numbers grew larger by the day, we stumbled upon several bottlenecks related to PostgresSQL, and how it deals with that kind of scale. This blog post will walk you through our optimization journey: our thought process, investigation, and various approaches we’ve experimented with.
As a developer using Postgres, you might sometimes face cases where you have to constantly use
DELETE queries in your system. In those cases, you might want to consider using partitions, since you can easily drop a huge chunk of data with a single operation.
Our system saves data for auditing purposes. This data is used for a short period of time, so we chose to use daily partitions and a cron job that constantly runs and drops the oldest partitions.
At the time, we were using Postgres 9.6, so we implemented the partition using callbacks. You might want to check the following blog post for a full partitioning solution using Postgres 9.6.
Great! we had tables with daily partitions and we thought we had figured it all out. The service response time was low, at around 20ms per transaction, and everything was running smoothly.
Surprisingly, though, after about 2 weeks in production, the response time suddenly and sporadically escalated to around 200ms. At first we thought it was an issue with the data itself, but we then started investigating further using some queries.
First, we wanted to check if Auto Vacuum is running on our Postgres:
In addition, we wanted to look for blocked sessions to know which process creates locks on tables:
Going deeper into auto vacuum
When we executed these two queries, we discovered that Postgres’ auto vacuum process that runs on dead rows in the background causes a severe performance drop for the entire database.
Basically, whenever we updated or deleted a row from a Postgres table, the row was simply marked as deleted, but it wasn’t actually deleted.
In case of an update, Postgres creates a new row and marks the old one for deletion. The physical deletion of the row is done by the auto vacuum process, which runs whenever it reaches some dead-rows-per-table threshold. You can run the following query to find the number of dead rows per table:
At this point, it was obvious partitions alone aren’t going to cut it.
Moving on to the next idea…
With this new information, it became clear to us that we must minimize the number of updates and deletions in our system to maximize Postgres’ performance, so we decided to go with an event store approach.
In this approach, records in your tables are immutable. If you need to change the state of a record, you’ll simply insert a new one instead. By using the inserted record’s date field, you can know which record is the latest, virtually eliminating any record updates in your system, which theoretically means the vacuum process shouldn’t run. For any deletions, you can simply drop the related partitions instead of deleting the actual records directly.
Like the previous solution, this idea was running just fine for a couple of weeks and everything looked great. Unfortunately, about 3 weeks into this variation, we suddenly started experiencing the same issue as before. A sudden, unexpected, performance degradation.
With that, it was time to try some more drastic measures.
Moving to Postgres 10
As you can guess by now, we have yet to solve the auto vacuum issue with the solutions suggested so far. The vacuum process continues to run despite the fact we didn’t perform any updates or deletions in our system.
We later realized an interesting fact about how partitions are implemented when using ORMs with Postgres 9.6. They use a callback function, which basically means the record is first inserted to a parent table, since ORMs rely on PostgreSQL
RETURNING statement for every database action, which can only be achieved by first inserting the row to a parent table. At this point, the record is delegated to the child table and removed from the parent. In essence, every new record creates a dead row in its parent’s table which, as we know by now, triggers the vacuum process.
And there it was — we finally realized our problem. The row is first inserted to a parent table, and is then left as a dead row, causing the vacuum process to perform its cleaning work.
Two approaches to solve this come to mind:
- You can move away from using ORMs and use raw queries instead, by directly going to the child table, but you’ll still get stuck with the callbacks mechanism.
- You can use a native solution with virtual tables using the
PARTITION BYkeyword that was introduced in Postgres 10. for example: you have a
measurementtable and you want create partitions by a date field:
As we were already using the hacky callback mechanism and realized how aggressively it slowed down the database’s performance (~5ms per transaction), it only made sense to go with the native solution that gives us the best of both worlds.Check out the full documentation on the native partition solution on Postgres’ official documentation.
We’ve come a long way with our service since we started to work on it and we had finally thought that we got rid of auto vacuum process 👏.
Our service was supposed to store data for some predefined period of time, but, again, as luck would have it, the vacuum process started running again — at a much lower threshold than we expected, and much earlier than we expected! 😱
After almost giving up on working with Postgres and researching different database solutions, we got back to Postgres’ documentation regarding vacuum and found the missing puzzle piece 🤗.
According to the documentation of Postgres’ vacuum cleaner, the vacuum can run for 2 reasons. The first reason was already covered in this post, while the second one is a little bit more interesting.
Without going into too much detail, Postgres has an anti-wraparound mechanism (Check out this blog post for more details). Postgres attaches a special id, called
XID, to every transaction as part of the MVCC architecture. This is Postgres’ way to know which transaction is first in line.
In 32-bit systems, though, the maximum number of transactions is around 4 billion (2 billion for past transactions and 2 billions for future transactions) for each table. The way Postgres manages this limit is by running auto vacuum on the table with the highest transaction id (the aforementioned
XID), where it also marks all rows in that table as frozen rows. This means that these rows’ transactions are all in the past and no longer have an active transaction id, pseudo-“freeing” these transactions ids for reuse later on. This lets Postgres keep the fixed amount of live transaction.
The parameter indicating the maximum amount of non-frozen transactions in Postgres is called
autovacuum_freeze_max_age. By default, it’s set to 200M transactions.
This is where we’ve hit our ah ha moment. 💡
In our specific case, we’ve reached about 200M transactions relatively quickly on our first table. As expected, the auto vacuum started running on this table, which explained the slowness of the system.
In our case we measured and reached the conclusion a 500M transaction limit should suffice, since we have automatic processes to purge these tables. Thus, we’ve simply increased
autovacuum_freeze_max_age to 500M transactions, a safe limit for the specific problematic table, ensuring the anti-wraparound mechanism will never be triggered, and keeping our system fast and sane. Eureka !
We’ve come a long way with our service and learned a lot about Postgres’ internal mechanisms. We can finally (hopefully) say - “WE FIGURED IT OUT!!”.
I hope you’ve enjoyed this blog post! If you have any questions or want to learn more about what we do, feel free to leave a comment below.