Postgres Randomized Primary Keys
There are times in your life when you might be asked to randomize the primary keys of a table so that they are numeric, unique, but non-sequential. If you been assigned this task, then this post is for you ❤.
Numeric, Unique, but Non-Sequential?
That’s correct. While typical primary keys are just simple sequences of integers (1,2,3,4,5,6,…), this default is problematic because it:
- Hints at the number of records in your table
- Makes it easy for users to guess random IDs
Instead, we might prefer a randomized sequence of integers that obfuscates our primary keys without forcing ourselves to use nasty old UUIDs. To achieve this, we are going to use a lightweight Feistel cipher and some handy Postgres functions.
Enter the Feistel Cipher
Feistel ciphers are a type of block cipher that are symmetric and can be written in such a way that they:
“produce an integer output that is uniquely associated to its integer input (by a mathematical permutation), but looks random at the same time, with zero collision”
- Postgres Wiki
This is really the secret sauce that ensures we have random looking numbers without the risk of generating duplicate values. The Postgres wiki has a great implementation of this function that I’ve embedded below with a slight modification (explained further down).
What About My Existing Records?
Most of the information you’ll find online about this topic does not take existing records into consideration. If you already have a bunch of records that were generated using sequential primary keys, you would be at risk of generating a duplicate ID, potentially making it impossible for users to generate new records.
To account for this, I’ve slightly modified the custom function outlined in the Postgres wiki to allow for a minimum value – meaning we can have a function with a signature like
bounded_pseudo_encrypt(id, max, min)`. If you already have 500 records, just set the
min to 500 and your random IDs will never overlap with your existing records.
The Postgres Custom Functions
Before we can use our handy random ID generating cipher, we need to register it as a function within Postgres. In this case we are actually going to register two separate functions. One that generates the “random” ID, and the other that ensures we respect the min/max bounds. These functions are written using SQL Procedural Language (PL/pgSQL) which you can read more about here.
Function 1 — Pseudo Encrypt (24 Bit)
Note: The 24 bit limit means the highest value we can have for an ID is 16,777,215. There are versions of this code that allow for higher, but I’ll let you find those yourself.
CREATE FUNCTION pseudo_encrypt_24(VALUE int) returns int AS $$
l1:= (VALUE >> 12) & (4096-1);
r1:= VALUE & (4096-1);
WHILE i < 3 LOOP
l2 := r1;
r2 := l1 # ((((1366 * r1 + 150889) % 714025) / 714025.0) * (4096-1))::int;
l1 := l2;
r1 := r2;
i := i + 1;
RETURN ((l1 << 12) + r1);
$$ LANGUAGE plpgsql strict immutable;
Function 2 — Bounded Pseudo Encrypt
CREATE FUNCTION bounded_pseudo_encrypt(VALUE int, MAX int, MIN int) returns int AS $$
VALUE := pseudo_encrypt_24(VALUE);
EXIT WHEN VALUE <= MAX AND VALUE >= MIN;
$$ LANGUAGE plpgsql strict immutable;
To register these functions with Postgres simply run them like you would run a normal query. If you need to modify them, you can drop and recreate them by running:
DROP FUNCTION bounded_pseudo_encrypt(VALUE int, MAX int, MIN int);
DROP FUNCTION pseudo_encrypt_24(VALUE int);
Ensuring They Work
With our functions registered, we can easily test them by generating a sequence and passing it through our shiny new functions.
-- Returns 50 rows
SELECT n, bounded_pseudo_encrypt(n, 50000, 10000) FROM generate_series(1,50) n;
-- Returns a single row
select bounded_pseudo_encrypt(1, 16777215, 2);
The results of the first query should look something like this:
Updating Your Primary Key Column
The final thing we need to do is alter our table so that the primary key column uses our custom functions to generate IDs. This is actually pretty easy — you just need to run the following query:
WARNING: ALWAYS BACKUP YOUR DATABASE BEFORE DOING ANYTHING CRAZY.
-- Assuming you already have 1000 existing records and want the max (16,777,215) number of IDs...
ALTER TABLE your_table ALTER COLUMN id SET DEFAULT bounded_pseudo_encrypt(nextval('your_table_your_table_id_seq')::int, 16777215, 1000);