Prevent Business Intelligence Leaks by Using UUIDs Instead of Database IDs on URLs and in APIs
Let’s examine an innocuous looking URL that you may see when you are looking at your order history on some e-commerce site:
Speaking strictly from an application security perspective, there is no problem here. As long you’ve done your job on the back end checking that the order’s user matches the logged in user, there is no security risk. You can’t look at other users’ orders, and they can’t look at yours.
The problem is that this one URL gives a surprising amount of insight into the underlying data, and therefore the business. Let’s see how.
Leak #1: Overall Business Data Size
If this is my (recent) order, there is a good chance that there are only about six thousand orders in that business’s system across all users. That’s a very sensitive piece of data that the business does not want to share.
Leak #2: Overall Business Data Velocity
If I worked for a competitor and was doing competitive research on this business, I would create a couple low value orders roughly an hour (or some other fixed time) apart. Let’s say the details pages of the two orders have the following URLs:
This shows me that the business is doing roughly 20 orders per hour. Again, very sensitive competitive information no business wants so easily determinable.
Why This is So Common
Certainly not every online business exposes data in this way. But this pattern is common enough that it warrants some attention. In fact, every ‘your first store’ or similar web programming language tutorial leads the learning developer directly down this path. You’ve probably already guessed it: relational databases and generated auto-incrementing sequences.
Relational databases and relational data still power the vast majority of the transactional web and will continue to do so for the foreseeable future. Auto-incrementing numeric sequences as primary keys are a very standard, often default pattern for a relational database table. It’s easy to see why: the database does it for you and lookups are lightening fast. The first row gets an id of ‘1’, the second an id of ‘2’, and so on.
Every online business generally starts with a single relational database at the core of its architecture. Even as they grow and potentially introduce distributed components and NoSQL datastores and caches to meet load demands, it is likely that this core e-commerce data (orders, items, users) will remain relational. The data is relational and the data architecture itself using numeric sequences and relations is not wrong.
So what to do? Luckily there is a good solution that doesn’t require data re-architecture or moving away from auto-incrementing sequences.
The Solution: UUIDs to the Rescue
Changing active data is hard, but adding a new column to a table in a relational database should be doable without downtime even for a large table under load. Of course you should check the documentation for your particular DB and version to add and populate the column in such a way as to minimize production impact.
Before change, a excerpt of data from our example data architecture could be as follows. There will naturally be many other columns in these tables (they have been omitted for diagram readability).
The column we want to add (to any table that will be addressed via the client side) is a UUID.
A UUID is a universally unique alphanumeric identifier that says nothing about the underlying data. While any long unguessable alphanumeric string will work, UUIDs are a great choice as their generation is natively supported in most modern programming languages and even in databases themselves. This makes populating the column very easy and architecturally flexible as it can be done in code, or right in the database automatically on new inserts, depending on your needs.
This example data illustrates that a UUID is not needed on every table, only those that are referenced from the client side (we are assuming here that OrderItems are shown in context with an order and not individually). It is completely fine (and more efficient) for the back end to continue to use the numeric primary/foreign key relationship to get order item information for an order, or look up details for an order item directly, as those ids are never shown to the user.
Update your Client Side Driven Lookups
Once you have UUIDs on the relevant tables (don’t forget to add appropriate indexes to these tables), it is a matter of combing through your front end for anywhere an ‘id’ is exposed and updating it to UUID, and updating the back-end lookup code appropriately to expect and use UUID for database lookups.
Here is how our two order history URLs look now, giving away nothing:
Note: it is often necessary to have the back end continue to support old ‘id’ based URL patterns as well, as we’ll see in the next section: links in the wild.
Remembering Links in the Wild — User Reports and Emails
It is tempting to remove back-end support completely for ‘id’ coming in as a parameter. Before you do however, consider if there are any links in the wild that point back to these locations. Things like generated user reports and sent transactional emails that point back to order history are prime candidates; the links are persisted and if you remove support for the old ‘id’ pattern, they will break. There is no business risk to allowing the back end to continue to support the old patterns where needed.
Dealing With Your API
Uh oh! You have a public facing API that exposes a sensitive database id in its request or response data. This is a bit more of a challenge as you can’t just remove it and replace it like you did on the front end without risk of breaking your API integrators’ build-ins. The best strategy here is to work with the appropriate internal stakeholders as quickly as possible to come up with a deprecation plan. It will generally involve most or all of these steps; the details and timelines are very business dependant depending on the adoption and importance of the given API.
- Add ‘UUID’ alongside every place the ‘id’ is exposed in the request / response with a minor version release. Adding things to an API should be a non breaking change.
- Communicate with your API users about the deprecation of ‘id’ and the timelines for the breaking change. Encourage them to start using ‘UUID’.
- Add as much logging as you can to determine the use of the offending endpoints (especially if ‘id’ can be a request parameter)
- Remove ‘id’ through an appropriate breaking change update (such as a major version update)
- (Finally) end of life the version containing ‘id’ as per your deprecation plan communications.
It is worth the time it takes to look through your online offering for this pattern. It doesn’t just apply to URLs; it applies to the entire client side (hidden form data, cookie values, APIs as discussed, etc.) — URLs are just by far the most visible place. And while we used e-commerce data as an example, it applies to any database object for which count or velocity is business sensitive.
Don’t be lulled into a false sense of security if your sequences are not auto-incrementing sequentially or if your data is partitioned. While the risk is lessened, as long as there is any predictable pattern in your exposed ids, some information is being leaked and the pattern can be interpreted.
If you find some, don’t panic; assess the business risk internally, determine the size and scope of the change, and prioritize. Happy building!