Airframe ULID: Lexicographically Sortable Identifiers for Scala and Scala.js

Taro L. Saito
Airframe
Published in
5 min readApr 19, 2021

What is ULID?

ULID (Universally Unique Lexicographically Sortable Identifier) is a case-insensitive 26-character string consisting of 48-bit timestamp (10 characters) and 80-bit randomness (16 characters):

 01AN4Z07BY      79KA1307SR9X4MV3|----------|    |----------------|
Timestamp Randomness
48bits 80bits

We can use ULID as identifiers for labeling records while maintaining their time-series order with millisecond resolution. You can use ULID for providing chronologically sortable IDs in distributed systems. ULID doesn’t guarantee the ordering if ULIDs are generated within the same millisecond from multiple machines, but if you have a single coordinator, it’s possible to generate monotonically increasing ULIDs, which will be explained later.

ULID was born to overcome several shortcomings of UUID (v1-v4), whose string representation requires a bit long 32 characters for representing the same 128-bit values (e.g., 2a74dd3d-1abd-45a4-a147-a288a612fcd), and using UUID as identifiers just provides random distribution of records without any context like UNIX timestamp.

ULID provides both of the context (UNIX time) and randomness to IDs with URL-safe characters encoded with Crockford’s Base32, which excludes letters like I, L, O, and U to avoid confusion for human eyes. With ULIDs, you can perform time-range queries over records by using regular string range queries. And also, generating ULIDs can be faster than generating UUIDs and many implementations are already available for the most of the programming languages.

So, why not using ULIDs?

Airframe ULID: The Fastest ULID Generators for Scala

We have released Airframe ULID (airframe-ulid) as a standalone library (i.e., no dependency other than Scala) for generating and parsing ULIDs. Airframe ULID is available since Airframe version 21.4.0:

airframe-ulid can be used both for Scala and Scala.js. Scala.js support is useful when developing web applications. We often use ULIDs for assigning IDs to case class objects, e.g., case class Record(id: ULID, param:String, …), and pass these objects between server and clients (browser) with Airframe RPC.

Usage

build.sbt

libraryDependencies += "org.wvlet.airframe" %% "airframe-ulid" % "21.4.0"// For Scala.js
libraryDependencies += "org.wvlet.airframe" %%% "airframe-ulid" % "21.4.0"

(Note: check the latest version at the website)

ULID can be generated with ULID.newULID method:

import wvlet.airframe.ulid.ULID// Generate a new ULID
val ulid: ULID = ULID.newULID
// ULID.toString produces the String representation of ULID
println(ulid) // 01F3J0G1M4WQRBHGZ6HCF6JA0K
println(ulid.epochMillis) // 1618733434500
// Parse the string representation of ULIDs
val ulid2 = ULID.fromString("01F3J0G1MD4QX7Z5QB148QNH4R")
ulid2.epochMillis // 1618733434509

Monotonically Increasing ULIDs

ULID.newULID will produce monotonically increasing ULIDs in a thread-safe manner. If ULIDs are generated within the same millisecond, airframe-ulid will increment the random part of ULID by 1 to enforce the ordering:

01F3HZ9V4BHVHJMMETE0MFBQKH
01F3HZ9V4BHVHJMMETE0MFBQKJ // <- random part was incremented by 1
01F3HZ9V4BHVHJMMETE0MFBQKK // <- random part was incremented by 1
01F3HZ9V4BHVHJMMETE0MFBQKM // <- random part was incremented by 1
01F3HZ9V4BHVHJMMETE0MFBQKN // <- random part was incremented by 1
01F3HZ9V4C8SWD21A4SCM4NMD8 // <- millisecond is changed
...

Empirically speaking, generating monotonically increasing ULIDs will reduce the probability of having ULID conflicts compared to using completely random ULIDs, according to this report. This report also showed that the chance of a ULID collision over the next 100 years is merely 0.000126%. So, in terms of risk management, we don’t need to worry too much about ULID conflicts.

If you use multiple machines, there is no guarantee that generated ULIDs will be monotonically increasing values. If you need a strict ordering, we recommend having a single coordinator to generate ULIDs. In airframe-ulid, if the 80-bit space for the randomness is exhausted (it’s quite rare!, though), it will add an artificial delay of 1 millisecond to use the next millisecond value of ULID.

Serialization

ULID can be serialized and deserialized with airframe-codec:

case class Transaction(id:ULID, status:String)val codec = MessageCodec.of[Transaction]val json = codec.toJson(Transaction(id = ULID.newULID, status = "committed")) 
// {"id":"01F3HZ9V454BC466FPDSBTZ64G", "status":"committed"}
val txx = codec.fromJson(json)
// Transaction(01F3HZ9V454BC466FPDSBTZ64G,committed)

Airframe RPC internally uses this capability of airframe-codec to pass ULIDs data between servers and clients in JSON or MessagePack format.

Performance

airframe-ulid can produce 5 million ULIDs / sec. on my MacBook Pro (2018 model). As of April 2021, airframe-ulid is the fastest ULID generator in Scala:

Comparison targets:

Here is the benchmark code using JMH.

If we use Apple Silicon (M1), the performance difference is much more impressive. Airframe ULID can produce 10 million ULIDs / sec., which is 3.4x faster than generating UUIDs:

Here are several keys we found for optimizing the performance of the ULID generator:

  • Reducing the number of random value generation is the major performance factor. SecureRandom of Java is quite slow. For generating 80-bit random values, using SecureRandom.nextBytes(Array[Byte](10)) was found to be faster than calling SecureRandom.nextInt 3 times, which produces 32 x 3 = 96 bit values and wastes 16 bits. If we use the default non-secure Random (e.g., java.util.Random), however, there is no such performance difference between Random.nextBytes(Array[Byte]) or calling Random.nextInt 3 times.
  • Generating monotonically increasing ULIDs by remembering the previously generated ULID also worked well for reducing the random value generation. We just need to increment the randomness part of ULID by 1 within the same millisecond.
  • Using bit-shift operators as much as possible for encoding/decoding Crockford’s Base32 strings is important. It seems Apple Silicon’s cache, which is embedded in the same chip, has significant advantage for such operations. Both of airframe-ulid and Chartwork’s scala-ulid use the same SecureRandom algorithm, so the only difference should be Base32 encoding part. You can find such techniques in Hacker’s Delight:

Based on these observations, in Airframe ULID, we decided to use SecureRandom.getInstanceStrong() as a random value generator by default, which often uses NativePRNGNonBlocking as a primary choice. We know it’s slow, but this slowness can be compensated by the fast increment operation.

ULID.nextULID of Airframe ULID will provide monotonically increasing ULIDs with the best available randomness characteristic in the current JVM. Users don’t need to care about such internals and the choice of random generators. The default behavior will provide the safest and fastest results.

Special thanks to Junichi Kato for finding the performance issue of an early version of airframe-ulid (A blog post written in Japanese).

Link: GitHub source code of airframe-ulid

--

--

Taro L. Saito
Airframe

Ph.D., researcher and software engineer, pursuing database technologies for everyone http://xerial.org/leo