Deleting duplicates in a postgres table

Occasionally we’ve forgotten to add a natural unique key to some of our tables. That combined with funky asynchronous javascript interfaces meant we ended up with duplicates in our database tables.

In order to solve the issue it is necessary to:

  • Delete duplicates (and leaving only one entry, generally the most recent one).
  • Add the natural key to the table

I’ve recently found a simple query that identifies the row to delete

SELECT id FROM (SELECT id, row_number() over (partition BY col_1, col_2, ... ORDER BY created_at DESC) AS rnum FROM quotes_Site.campaign_notes) t WHERE t.rnum > 1

In the example I’m returning the least recent ones, and the duplicates are identified by col1, col2, etc.

Combining that with a DELETE statement is straightforward.

Enjoy!


Originally published at underthehood.carwow.co.uk on April 24, 2015.

Interested in making an Impact? Join the carwow-team!
Feeling social? Connect with us on Twitter and LinkedIn :-)