UUID Specifications

Burak Bozacı
5 min readJul 2, 2023

Imagine you are writing a license validation software. Or you want to take action against end-users being enumerated. We may need UUID (Universally Unique Identifier) in many different scenarios. Let’s go deeper.

UUID (Universally Unique Identifier) is a universally unique identifier. UUIDs are often used as a primary key or unique ID, providing a guarantee of randomness and uniqueness. UUIDs are defined by a specification based on RFC 4122 (Request for Comments).

USE CASES

1.License Verification Keys: You can use unique keys to verify the license of a software. Generating a unique UUID for each license and using that UUID during the licensing process can ensure the uniqueness of the license and facilitate the license verification process. Simple example:

import uuid
import hashlib
from datetime import datetime


def get_machine_id():
try:
with open('/etc/machineid', 'r') as f:#machine unique id
machine_id = f.read().strip()
return machine_id
except IOError:
return None


def generate_license_key(name, expiration_date):
machine_id = get_machine_id()
if machine_id:
license_data = f"{name}|{expiration_date}|{machine_id}"
license_key = hashlib.sha256(license_data.encode()).hexdigest()
uuid_key = str(uuid.uuid4())
final_license_key = f"{license_key}-{uuid_key}"
return final_license_key
else:
return None


def verify_license_key(license_key, name, expiration_date):
machine_id = get_machine_id()
if machine_id:
license_data = f"{name}|{expiration_date}|{machine_id}"
license_key_without_uuid = license_key.split('-')[0]
expected_license_key = hashlib.sha256(license_data.encode()).hexdigest()
if license_key_without_uuid == expected_license_key:
return True
return False


name = "Burak Bozaci"
expiration_date = datetime(1920, 04, 23)

license_key = generate_license_key(name, expiration_date)
print("License Key:", license_key)

# License validation
is_valid = verify_license_key(license_key, name, expiration_date)
print("Is valid:?", is_valid)

2. User IDs: You can use a UUID to uniquely identify a user. For example, you can assign a unique UUID to each user to register and identify users in a web application. This can be useful for tracking users, authenticating, and correlating data. You can use uuid instead of ids for api route decleration.

If I had used id instead of UUID, an authenticated user could pull other users’ information as follows and create a data pool for himself.

#!/bin/bash

BASE_URL="http://localhost:8000/user/"
TOKEN="eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJ1c2VybmFtZSI6InN0cmluZyJ9.o-KAnoknzLaN3OZjv1zhFSkhazxNSeeJeku-Xzn5Oow"

for ((i=1; i<=250; i++))
do
URL="${BASE_URL}${i}"
curl -X GET "$URL" -H "accept: application/json" -H "Authorization: Bearer $TOKEN"
done

3. Database Primary Keys: You can use UUID to uniquely identify records in a database. UUIDs, when used instead of auto-incrementing numeric keys, allow data to be securely replicated across different servers or databases.

Randomness and Uniqueness

  • UUIDs are represented as a 32-digit hexadecimal string. It is in four separate groups, which can also be represented without the use of hyphens (“-”) or any separators.
  • UUIDs guarantee the creation of an identity that is recognized as unique worldwide. The probability of each UUID is very low, which means that the probability of generating the same UUID more than once is almost impossible.
  • Some UUID versions include a timestamp. This indicates when the UUID was generated and is useful in scenarios that require time-based sorting or a time-related ID.
  • Some UUID versions are generated based on a name or text. These versions can be used to generate the same UUID from the same name or text.

Versions

There are several uuid versions available. I don’t want to make this part too long because it’s pretty popular information.

UUIDv1 It is based on time and MAC address. Often its primary purpose is to ensure uniqueness and time-based sorting. It contains a 60-bit timestamp and a 48-bit MAC address.

UUIDv2 — Reserved for DCOM (Distributed Component Object Model) use, but not widely used and supported in general.

UUID v3 & v5 — Created using the MD5 hash function using a name or a text-based name and a namespace ID. The same name and namespace ID always generate the same UUID. The difference between v3 and v5 is which hash type they use. Version3 uses MD5, version5 uses SHA-1.

UUIDv4 — Generated with random numbers. This is usually the most used and widely supported UUID version. It contains 122 random bits and the remaining 6 bits encode the version and variant information.

Forcing UUID Randomness

Collisions are unlikely to occur on standard version-1 and version-2 UUIDs that use unique MAC addresses from network cards. Although version-4 UUIDs have a small enough chance to normally be ignored, collisions can occur even without implementation issues. This possibility can be challenged precisely by the method called birthday attacks.

This is an astronomical probability. The type of attack used to calculate collisions is called a birthday attack. The birthday attack takes its name from the birthday paradox. The birthday paradox examines the probability that some couples in a group of n randomly selected people will have the same birthday.

You can see above how the probability that at least two people’s birthdays coincide increases as the number of people in the surveyed group increases.

UUID Collisions

For example, the number of random version-4 UUIDs which need to be generated in order to have a 50% probability of at least one collision is 2.71 quintillion, computed as follows:

This number is equivalent to generating 1 billion UUIDs per second for about 86 years. A file containing this many UUIDs, at 16 bytes per UUID, would be about 45 exabytes.

The smallest number of version-4 UUIDs which must be generated for the probability of finding a collision to be p is approximated by the formula:

Thus, the probability to find a duplicate within 103 trillion version-4 UUIDs is one in a billion.

--

--