Reaching the Max Limit for Ids in Postgres

Ann Molly Paul
Coffee Meets Bagel Engineering
4 min readJan 24, 2018

Databases and their limitations can be the source of interesting engineering challenges. One such challenge we recently faced as a team was dealing with the limitation of the int datatype on an auto incrementing primary key for a fast growing table in Postgres.

In this post, I’ll discuss this issue in detail and share learnings from 4 different strategies we drafted and tested before rectifying this in production.

The maximum number of available ids given the int constraint is +2147483627, which is approximately 2.1B records. The growth rate of this table was on the order of 80M+ records per month which gave us approximately 3 to 4 months of runway before catastrophic failure. The irony of this situation is that this indicates steady and healthy growth of our company.

To estimate the monthly growth of ids in the table, we used the following query.

SELECT COUNT(t.id), DATE_TRUNC( 'month', t.time_stamp ) 
FROM big_table AS t
GROUP BY DATE_TRUNC( 'month', t.time_stamp )
ORDER BY DATE_TRUNC( 'month', t.time_stamp );

The implication of reaching the hard limit in Postgres is

  • Inserts would no longer be permitted (which is the default behavior in Postgres).
ERROR:  nextval: reached maximum value of sequence 'big_table_id_seq' (2147483627)
  • If id wraparound was set up on the sequence, ids could silently rollover which could massively corrupt data and foreign key relationships that depend on this table.

Neither scenario is ideal.

The main recommendation here is to have a system that monitors the growth of tables and setup alerts to trigger well in advance.

The solution is a migration to convert the datatype of the column id in the table from data type int to bigint. Sounds simple, but not quite.

In the next few sections, I’ll share details of 4 different strategies that could be used to solve this, how we tested/timed them and why we chose Strategy 4.

Size of the dataset used : 1540336116 (~1.5 Billion)

Strategy 1

Change the datatype by using ALTER COLUMN.

Risks:

  • Acquires a access exclusive lock which can be problematic for big tables.
  • Causes table rewrite where Postgres rewrites every single row of the table, which is a very expensive operation (both in terms of disk I/O and wall clock time).

To perform this, we would require a maintenance window where writes to the DB are stopped.

Steps:

  • Identify all Foreign Key constraints to the id column of big_table (1s)
  • Drop all Foreign Key constraints that reference id (1s)
  • Drop the Primary Key constraint on id (1s)
  • Run alter table to change datatype from int to bigint (20 hours and counting, this query was aborted.)
  • Add back Primary Key constraint to id (1s)
  • Add back Foreign Key constraints to reference id (1s)

Strategy 2

Create a new table with the same schema butid with type bigint and use INSERT INTO.

Risks:

  • Bloating the disk usage on the database server.
  • Bloating the CPU and memory usage on the database server during the SELECT operation.

To perform this, we would require a maintenance window where writes to the DB are stopped.

Steps:

  • Create a table with the same schema as big_table except for id. (< 1s)
CREATE TABLE big_table2 ( LIKE big_table INCLUDING ALL );ALTER TABLE big_table2 ALTER COLUMN id TYPE bigint;
  • Drop all Foreign Key constraints that reference id in the new table. (1s)
  • Load the data from big_table to big_table2. (10 hours and counting, this query was aborted).
INSERT INTO big_table2 SELECT * FROM bigtable;
  • Add back Foreign Key constraints to reference id.(1s)
  • Rename table big_table2 to big_table. (2s)

Strategy 3

This strategy is very similar to Strategy 2, except that instead of copying over data in one SQL query, we slowly copy over chunks of records over a longer period of time.

Risks:

  • Manually syncing updates on records that were already migrated.
  • Manual validation to ensure data integrity.
  • Possibility data corruption due to race conditions.

To perform this, we would not require a maintenance window but this is an engineering heavy, high risk method.

Steps:

  • Create a table with the same schema as big_table except for id. (< 1s)
  • Create a job that copies over records 100000 at a time. (10–12 days at best)
  • Create a post-save and post-update db hook, that re-writes the data in the new table to keep updates in sync. (10–12 days at best)
  • Switch reads to the new table. Code change and deployment required.
  • Switch writes to the new table. Code change and deployment required.

Strategy 4

Add a new column to the table with type bigint. Copy over id values to id_bigint and then rename the new column to id (by renaming the existing id to id_old).

Risks:

  • None.

To perform this, we would require a maintenance window where writes to the DB are stopped.

Steps:

  • Add a new nullable column id_bigint. (1s)
  • Script that runs an update to id_bigint with values from id in chunks of 100000. (3–4 days at best)
  • Once in maintenance mode, drop primary key constraint, foreign keys constraints and sequence on id. (4s)
  • Rename id and id_bigint. (1s)
  • Add back primary key constraint. (40 minutes)
  • Add back foreign key constraint. (45 minutes)
  • Drop not null constraint on id_old (the previous primary key). (1s)

After weighing the risks and time involved, Strategy 4 was a clear winner.

In conclusion, monitoring of databases is crucial. In case you are closing in on limits, I hope this post gives you a clear idea of different approaches to evaluate.

In my next blog post, I’ll go over the details of Strategy 4; SQL queries involved, sample validation tests, backup plan that was prepared, one issue that was encountered and how to avoid or resolve it.

--

--