Relational Databases — Know your Primary Keys!

Li Xie
The Startup
Published in
5 min readOct 16, 2019

Primary Keys?

When designing a new relational database, you will face many crucial life-or-death choices along the way. Selecting a whole bunch of good Primary Keys will be a big part of this process, so you might as well think about doing it right, the first time around. Here we will look at the pros and cons of different types of Primary Keys, and how they can affect the design of your relational database with a very simple example.

In case you’re sat there wondering “How can I select good Primary Keys if I don’t know exactly what they are?’ — A Primary key is a column or a set of columns that stands out to allow each record in a table to be unique. For example, imagine you are now Instagram’s Database Expert (congrats, by the way), you will know that each of your 1 billion+ monthly active users could be uniquely identified by their Username.

Natural or Surrogate?

Primary Keys can be categorized into Natural Keys and Surrogate Keys.

Natural Keys have business meaning, and we make use of them in the real world every day to get by. For example, if you wanted to log into your Instagram account, you would use your username. This helps Instagram direct you to your own unique special space — so you can check how many Likes your latest selfie got overnight, and despair wondering why your previous selfie with your best friend got way more.

As the Instagram Database Expert, you might have designed your database to have a Natural Key here. The username of the user table is the Primary Key. The email column could have probably also been a unique identifier, so it can be considered as an alternate key here.

users table with a Natural Key as the Primary Key
Fig 1: users table with a Natural Key as the Primary Key

A surrogate key is a value, usually without a business meaning — generated with the sole purpose of acting as a unique identifier. Imagine you are the 1,345,222nd Instagram User to ever create an Instagram account, and even if Instagram glamorously awarded you with a Username of 1345222, you probably wouldn’t feel very special. The Intern who wrote Instagram’s very own ‘I Forgot my Username’ page can feel proud of how many people are making use of his work.

Well, if you, as Instagram’s Database Expert decided to model the users table with a surrogate key, then this may well be what happens behind the scenes in your Database. Surrogate Keys are very often auto-incrementing. The number doesn’t mean anything special — purely for unique identification.

users table with a Surrogate Key as the Primary Key
Fig 2: users table with a Surrogate Key as the Primary Key

Natural vs Surrogate — Pros and Cons

Now, you, as Instagram’s Database Expert will know that the topic of Natural Keys vs Surrogate Keys has been debated fiercely for years on end with no real consensus on which is better. To continue the debate, here are some Pros and Cons for each.

Natural Key Pros

  • The naming of the key has business meaning, so it is easier to work with. Maybe it’s easier to communicate using Usernames and Vehicle Registration Numbers, rather than asking your poor Intern to get all IDs of a different ID, whilst making sure the ID of the other ID is unique, before returning the Username of the correct ID.
  • There’s no need to create extra columns since the data already exists — your database design might be leaner and you save disk space if you don’t use a Surrogate Key column. Is it necessary to have a like_id in your likes table if a Composite Key of the post_id and user_id column combined can do the job already?

Natural Key Cons

  • A redesign might be needed if there is a change in business requirements. For example, what would happen if Facebook and Instagram suddenly decided that everyone’s Instagram username is now also suddenly your Facebook username?
  • Natural Keys (often Strings) are typically larger in size than Surrogate Keys (Integers) and joins between two tables on a natural key will take more time.

Surrogate Key Pros

  • A surrogate key is guaranteed to be unique. If you wanted to create a whole bunch of new Instagram accounts for Facebook users, then the Primary Key can be automatically generated as data is inserted. Therefore less affected by business requirements.
  • Very easy to define. There’s no need to argue with your Intern down the local pub whether username or email makes a better Primary Key.

Surrogate Key Cons

  • The extra columns in your tables increase storage and IO requirements and can affect performance especially when dealing with inserts and deletes.
  • It is a lot more difficult to look at Surrogate Keys and immediately determine anything meaningful about the record. For example, if you’re trying something new, it is a lot more difficult to tell if things are working as intended when there are lots of these flying around. You would have to make more joins to relate the surrogate keys with meaningful values.

What might this look like? A minimal example

Below are two very minimal Entity Relationship Diagram for Instagram.

In each, they essentially represent the same thing — the relationship between Users, Posts, and Likes on these Posts.

Instagram ERD with all Primary Keys as Surrogate Keys
Fig 3: Instagram ERD with all Primary Keys as Surrogate Keys

In Figure 3, each table is headed with a Surrogate Key as the Primary Key for each table.

Instagram ERD with a variation of Primary Keys
Fig 4: Instagram ERD with a variation of Primary Keys

In Figure 4, the users table makes use of username as a Natural key. The likes table uses a Composite Key, involving both the post_id and username columns. When combined, they are unique, since a User cannot like the same post more than once (unless they unlike it first). The posts table again has a Surrogate Key, since none of the other current attributes declared can combine to form a Composite Key.

There is nothing disastrously wrong with either design so far. But over time, your Relational Database Design will become more complex. It will juggle more data and accommodates more features. As a result, it might not be a terrible idea to become more acquainted with Primary Keys. Or just blame the Intern!

--

--

Li Xie
The Startup

Software Engineer & Aspiring Bot, based in London.