A Better Way to Protect Your IDs

Your web application has a strong way of authentication and every resource has checks if your user is actual authorized to access it. So why bother if the user actual knows the internal IDs for the models she is accessing?

Issue #1: Leak of Business Intelligence Data

 https://www.myshop.com/account/orders?orderid=7865

From this you can probably estimate how many orders they processed. But it gets worse. If you make another other one, lets say 14 days later and it gets the id 7921 you can deduce that they receive about 4 orders a day. This is business intelligence data you maybe don’t want your competition to know (see also this article for a more thorough discussion on this issue).

Issue #2: Brute Force Guessing of IDs

https://www.myphotos.com/p/share?id=10989927

Now this id might be totally random, or it might follow a sequence. An attacker could easily check a reasonable interval from e.g. 10989900 to 10990000 and see if any of these links work.

Issue #3: Leak of Personal Information Through IDs

What can be done?

Solution #1: Using UUIDs

Example UUID

The range of 122-bit is so huge, you can pick any such number randomly and have a nearly 100% chance of it being unique in your context. In fact you are probably the first person to every generate this exact number. Note though that a UUID does not give guarantees whatsoever of it being truly random — most implementation are however. See this post for more info on the issue.

It is now absolutely infeasible for an attacker to guess your ids (but guessing is theoretically still possible) and if the UUIDs are truly random, there is no observing a sequence (fixes Issue #1 and #2).

The downside of using UUIDs is that it is maybe slower or more expensive to index in your DB if you use it as primary key and might be a hassle to create a new column if you use it as correlation id. Read here for more in-depth discussion about UUIDs used in databases.

Also you still expose an internal id to the public, which if used as primary key, cannot change. This requirement might not happen often, but it does. You may also still be prone to Issue #3.

Solution #2: Mask your IDs

Before publishing your IDs, encode or encrypt them in a way that makes the underlying value incomprehensible for any client not knowing the secret key.

In my opinion HashIds has, among others, two main issues:

  • Only supports integer types and restricted by limitations set by the original Javascript implementation (e.g. only positive integers up to 2⁵³ bit).
  • No real security, more like a home-brew keyed encoding schema and no forgery protection which means an attacker can still easily brute force IDs without understanding them.

Improved ID Protection: ID-Mask

  1. Support of all types usually used for IDs
  2. Strong cryptography with forgery protection
  3. Optional randomized IDs

Note that with this (and HashIds) approach, there is no possibility of collision since no compression happens (like with a hash).

Full Type-Support for IDs

  • 64-bit integers (often called long)
  • UUIDs(which are essentially 128-bit numbers)
  • Arbitrary precision integers (called BigInteger in Java)

If we somewhat restrict the arbitrary precision part to around 128 bit, we can group into two basic id types: 64 bit and 128 bit IDs. All of those data types (and some more exotic types for specific uses cases) are supported by the library.

Strong Cryptography with Forgery Protection

In addition of solving the main Issue #1, these properties also protect from the attack described in Issue #2: brute forcing. With an so called authentication tag (i.e. the HMAC) attached to the id, it is now extremely unlikely for an attacker to generate a valid ID.

Support for Randomized IDs

An example would be shareable links. Using the same scenario as above, of the photo sharing app, instead of the actual value it would just look like this:

https://www.myphotos.com/p/share?id=U1P72UtA6uS6ddMcTmzdKJg

Using the same id, generating 2 more masked IDs will result in unrelated looking output:

https://www.myphotos.com/p/share?id=aGjTc5AQQlWl8REodDmAM1c
https://www.myphotos.com/p/share?id=bGx6LykZ0N_B2WpoT-1XbHg

Using this method the problem described in Issue #3: leak of personal information can be solved by generating randomized IDs for e.g. your user_id which cannot be used to find context information in e.g. your main site since they do not match. You are however still able to map the ids back to the original users.

Adapt the Encoding to Your Needs

Example using a 64-bit IDs:

with optional formatting for better readability:

Formatted ID

To avoid the problem of randomly occurring (englisch) words in the masked IDs which could create embarrassing URLs like

https://www.myportfolio.com/p?id=SH4RTM4N

a Base32 dialect was added with a custom alphabet containing no vowels and other problematic letters and numbers. For example these could look like this:

Encoding optimized to not contain words

And More

Code Example

For more see the readme of the Github project.

tl;dr

Software Engineer currently working through the stack: JS frontend, Android Mobile, Java Backend. Security is my passion.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store