Postgres Randomized Primary Keys

There are times in your life when you might be asked to randomize the primary keys of a table so that they are numeric, unique, but non-sequential. If you been assigned this task, then this post is for you ❤.

Numeric, Unique, but Non-Sequential?

That’s correct. While typical primary keys are just simple sequences of integers (1,2,3,4,5,6,…), this default is problematic because it:

  1. Hints at the number of records in your table
  2. Makes it easy for users to guess random IDs

Instead, we might prefer a randomized sequence of integers that obfuscates our primary keys without forcing ourselves to use nasty old UUIDs. To achieve this, we are going to use a lightweight Feistel cipher and some handy Postgres functions.

Enter the Feistel Cipher

Feistel ciphers are a type of block cipher that are symmetric and can be written in such a way that they:

“produce an integer output that is uniquely associated to its integer input (by a mathematical permutation), but looks random at the same time, with zero collision”
- Postgres Wiki

This is really the secret sauce that ensures we have random looking numbers without the risk of generating duplicate values. The Postgres wiki has a great implementation of this function that I’ve embedded below with a slight modification (explained further down).

What About My Existing Records?

Most of the information you’ll find online about this topic does not take existing records into consideration. If you already have a bunch of records that were generated using sequential primary keys, you would be at risk of generating a duplicate ID, potentially making it impossible for users to generate new records.

To account for this, I’ve slightly modified the custom function outlined in the Postgres wiki to allow for a minimum value – meaning we can have a function with a signature like bounded_pseudo_encrypt(id, max, min)`. If you already have 500 records, just set the min to 500 and your random IDs will never overlap with your existing records.

The Postgres Custom Functions

Before we can use our handy random ID generating cipher, we need to register it as a function within Postgres. In this case we are actually going to register two separate functions. One that generates the “random” ID, and the other that ensures we respect the min/max bounds. These functions are written using SQL Procedural Language (PL/pgSQL) which you can read more about here.

Function 1 — Pseudo Encrypt (24 Bit)

Note: The 24 bit limit means the highest value we can have for an ID is 16,777,215. There are versions of this code that allow for higher, but I’ll let you find those yourself.

CREATE FUNCTION pseudo_encrypt_24(VALUE int) returns int AS $$
DECLARE
l1 int;
l2 int;
r1 int;
r2 int;
i int:=0;
BEGIN
l1:= (VALUE >> 12) & (4096-1);
r1:= VALUE & (4096-1);
WHILE i < 3 LOOP
l2 := r1;
r2 := l1 # ((((1366 * r1 + 150889) % 714025) / 714025.0) * (4096-1))::int;
l1 := l2;
r1 := r2;
i := i + 1;
END LOOP;
RETURN ((l1 << 12) + r1);
END;
$$ LANGUAGE plpgsql strict immutable;

Function 2 — Bounded Pseudo Encrypt

CREATE FUNCTION bounded_pseudo_encrypt(VALUE int, MAX int, MIN int) returns int AS $$
BEGIN
LOOP
VALUE := pseudo_encrypt_24(VALUE);
EXIT WHEN VALUE <= MAX AND VALUE >= MIN;
END LOOP;
RETURN VALUE;
END
$$ LANGUAGE plpgsql strict immutable;

To register these functions with Postgres simply run them like you would run a normal query. If you need to modify them, you can drop and recreate them by running:

DROP FUNCTION bounded_pseudo_encrypt(VALUE int, MAX int, MIN int);
DROP FUNCTION pseudo_encrypt_24(VALUE int);

Ensuring They Work

With our functions registered, we can easily test them by generating a sequence and passing it through our shiny new functions.

-- Returns 50 rows
SELECT n, bounded_pseudo_encrypt(n, 50000, 10000) FROM generate_series(1,50) n;
-- Returns a single row
select bounded_pseudo_encrypt(1, 16777215, 2);

The results of the first query should look something like this:

Notice how the bounded_pseudo_encrypt values are respectful of our max (50k) and min (10k) values. Running this query again should result in the exact same results, ensuring we never have any duplicate values.

Updating Your Primary Key Column

The final thing we need to do is alter our table so that the primary key column uses our custom functions to generate IDs. This is actually pretty easy — you just need to run the following query:

WARNING: ALWAYS BACKUP YOUR DATABASE BEFORE DOING ANYTHING CRAZY.

-- Assuming you already have 1000 existing records and want the max (16,777,215) number of IDs...
ALTER TABLE your_table ALTER COLUMN id SET DEFAULT bounded_pseudo_encrypt(nextval('your_table_your_table_id_seq')::int, 16777215, 1000);

That’s All Folks!