Advanced Typescript: Tagged Types for Fewer Bugs and Better Security

Ethan Resnick
11 min readJan 14, 2024

--

What Problem do Tagged Types Solve?

A key selling point of statically-typed languages, like Typescript, is that the compiler will immediately alert you if you try to use one kind of thing where a different kind is needed. Pass a number to a function that takes a string: error! Ditto if you pass a User to a function that expects a Vehicle. Detecting these mismatches is how statically-typed languages catch bugs early.

However, trouble arises when things that shouldn’t be interchangeable nevertheless have the same type. Consider usernames, passwords, email addresses, customer ids, URLs, credit card numbers, and JWTs. By default, all these would have type string. However, not every function that accepts some strings should be able to accept every kind of string. For example, you might want:

  • A logging function that can only be called with strings that are known to be free of sensitive information/PII;
  • A function that sends an email, but can only be called with an email address that’s been verified already;
  • A function that bans a user, but can only be called with user ids (not accidentally with other strings or id types).

Beyond strings or numbers, you might want to have a function that accepts certain objects that come from user input, but only after their contents have been validated.

With that in mind, the key idea behind tagged types is simple: if we can force Typescript to see each of these things (that should not be interchangeable) as a distinct type, then Typescript’s normal type checking rules that prevent us from passing a number where a string is required will also prevent us from accidentally passing (say) a ProductId where a CustomerId is required.

More generally, the goal is to take values that happen to have the same or compatible representations at runtime (e.g., they’re all strings or all objects with the same properties) and make Typescript see them as different types. These new types will encode more precise information about what sort of thing we’re dealing with, and allow TS to catch mistaken uses of these values.

“Tagged types” let us do that.¹

Note that an alternative to creating distinct types is to use runtime validation, but that leaves a lot to be desired. See the excellent “Parse, don’t validate”.

How Tagged Types are Implemented, Roughly

Here’s something that won’t work:

type Url = string;
type Password = string;

function hashPassword(password: Password): string {
return sha256(password);
}

function isFirstPartyUrl(url: Url): boolean {
return new URL(url).hostname === 'mycompany.com';
}

With the above code, the compiler will happily allow you to treat a URL as a password, like this:

const someUrl: Url = "https://example.com";
hashPassword(someUrl); // no errors!

The reason is that the type keyword just declares an “alias”, so the Url and Password types are simply alternate names for the string type. Therefore, hashPassword actually accepts a string, and someUrl is a string, so the code compiles.

The deeper issue, though, is that Typescript is a structural type system, which means that it compares types based on their “shape”; if two types have the same shape, Typescript will always treat them as interchangeable, regardless of how they’re declared/named.²

This structural approach is often desirable, but it really complicates the case at hand. Here, the issue is precisely that we have multiple values whose shapes are naturally the same (e.g., they’re all strings), yet we want TS to distinguish between them.

Let’s build the solution to this problem up step-by-step. To start, let’s imagine an example system where users can provide arbitrary “documents” for the system to store. The type for this might look like:

type Document = { [key: string]: number | string | null | boolean }

Now, let’s assume that, in order to be valid, a document can’t have more than 500 keys. Unfortunately, Typescript’s types don’t let us represent this kind of restriction, so the Typescript type for a ValidatedDocument would look like:

type ValidatedDocument = { [key: string]: number | string | null | boolean }

Note that this is identical to the type declared for Document, so we can’t write functions that only accept validated documents; the structural nature of TS means that every Document is a ValidatedDocument, and vice-versa, even though they have different names.

However, we could make a function, validateDocument, that adds an extra, specially-named property to the document once it’s been validated. Then, we can redefine the ValidatedDocument type to require that this new property be present for a document to count as validated:

type ValidatedDocument = 
{ [k: string]: number | string | null | boolean } & { __isValidated__: true }


function validateDocument(doc: Document): ValidatedDocument | Error {
if(Object.keys(doc).length > 500) {
return new Error("Document exceeds legal number of keys");
} else {
return { ...doc, __isValidated__: true }
}
}

This approach works: unvalidated documents won’t be accepted by functions that require a ValidatedDocument, because Typescript will complain that they’re missing the __isValidated__ property.

However, this approach comes with some performance overhead, as we have to modify or clone every document in order to add this extra key to it. So, instead of actually adding an extra key to the Document after it passes validation, we can just lie to Typescript and pretend we added that key:

function validateDocument(doc: Document): ValidatedDocument | Error {
if(Object.keys(doc).length > 500) {
return new Error("Document exceeds legal number of keys");
} else {
// this cast, like all TS casts, has no effect at runtime.
// we’re simply pretending to TS that `doc` already has
// __isValidated__: true
return doc as ValidatedDocument;
}
}

In this way, the Documentand ValidatedDocument types still have different structures — namely, ValidatedDocument still requires the __isValidated__ property — so Typescript won’t let us interchange them. But, now Typescript thinks that the return value of validateDocument has this extra property, even though we didn’t actually add it at runtime.

Lying to Typescript like this might seem a little dubious. Sure, we get a performance benefit, but that gain is probably negligible in most cases. Moreover, it could open us up to bugs, because Typescript won’t complain if we try to read the __isValidated__ property on a ValidatedDocument, as TS thinks it exists, even though, at runtime, it really doesn’t.

Unfortunately, this lie is unavoidable when the values we want Typescript to differentiate are primitives, not objects, because there’s simply is no way to attach an extra property to a primitive value at runtime in Javascript.³

Concretely, to make string types that TS won’t consider interchangeable, we have to again make the types structurally distinct, like so:

type Url = string & { __isUrl_: true }
type Password = string & { __isPassword_: true }

declare function printUrl(url: Url): void;

// cast to pretend to TS that this string has __isPassword__: true
const password: Password = "super-duper-secret" as Password;

printUrl(password); // This now fails, as hoped!
// Error: Argument of type 'Password' is not assignable to parameter of type 'Url'.

With that code, we’ve told Typescript that Password strings will have one extra property, whereas Url strings will have a differently-named extra property. Because these properties have different names, Password- and Url-typed values are not interchangeable.

However, these types should seem very odd. As mentioned above, individual primitive values in Javascript can’t have distinct properties, so talking about some strings having a certain property while others have a different property is nonsensical. Further, since no strings have an __isUrl__ property, Typescript should realize that it’s impossible to have a value that satisfies string & { __isUrl_: true }, and should simplify that type to never.⁴

However, the Typescript team realized that the ability to make primitive types that aren’t interchangeable with one another is so useful that it justifies having “hacks” in the compiler that prevent Typescript from simplifying types like string & { __isUrl_: true } to never; instead, TS preserves this structure (acting as though such a value could exist) and uses it to reject code like the above that tries to use a Password as a Url.

Refining the Implementation

The code above was an implementation sketch, meant to show the basic mechanisms for getting Typescript to create distinct, non-interchangeable types that can be used for values that Typescript would otherwise treat as interchangeable. However, libraries that offer “production-grade” implementations of this approach, in the form of “tagged types”, usually include a few refinements.

The first is to replace the ‘magic key’ that makes the types distinguishable from one another with a symbol-named key. Every symbol is globally unique, so this change makes property naming conflicts impossible. It also ensures that the fake, not-actually-existent-at-runtime property won’t show up in the autocomplete dropdown.⁵ With this change, Url and Password would instead be defined as:

const isUrl = Symbol();
type Url = string & { [isUrl]: true } // __isUrl__ became [isUrl]

const isPassword = Symbol();
type Password = string & { [isPassword]: true } // ditto for __isPassword__

The second refinement is to simplify things so that we only need one symbol, which can be shared across all these types:

declare const tags: unique symbol;

type Url = string & { [tags]: { 'Url': void } }
type Password = string & { [tags]: { 'Password': void } }

Now, the property that’s used to make the types distinguishable is named by the tags symbol. As before, this property doesn’t actually exist at runtime, but that doesn’t matter: the Url and Password types still have distinct ‘shapes’, because they have different required keys ('Url' and 'Password', respectively) within the object that, as far as TS is concerned, must exist at their [tags] key.

The keys inside this object type at the [tags] key are known as the type’s “tags”, hence the name “tagged types”. So, we could say that Url is a string type with a 'Url' tag, whereas Password is a string type with a 'Password' tag.

One important thing to note is that the same type can have multiple tags. For example, we can define:

type PII = string & { [tags]: { 'PII': void } }
type Email = string & { [tags]: { 'PII': void, 'Email': void } }

// allowed; email has the 'PII' tag
const x: PII = "hello@example.com" as Email;

// not allowed; PII is missing the required 'Email' tag
const y: Email = "hello@example.com" as PII;

The Email type has two tags — 'PII' and 'Email'— so its values can be assigned to variables and passed to functions that accept PII (i.e., that require a string with the 'PII' tag).

The final refinement is to define a helper type that makes it much more ergonomic to declare tagged types in various configurations:

type Tagged<BaseType, Tag extends PropertyKey> = 
BaseType & { [tags]: { [K in Tag]: void } };

// A simple tagged type, that results in
// `string & { [tags]: { PII: void } }`, exactly like the PII type above.
type PII = Tagged<string, 'PII'>

// An example of adding a new tag ('Email') to an existing tagged type.
// Equivalent to Tagged<Tagged<string, 'PII'>, 'Email'>.
// Results in the same type as Email above.
type Email = Tagged<PII, 'Email'>

// Creating a tagged type with multiple tags
type UserEmail = Tagged<string, 'Email' | 'UserId' | 'PII'>

// Creating a _generic_ tagged type that will tag
// an arbitrary type T as Loggable
type Loggable<T> = Tagged<T, 'Loggable'>

// The original ValidatedDocument type, revisited.
// Results in `Document & { [tags]: { ValidatedDocument: void } }
type ValidatedDocument = Tagged<Document, 'ValidatedDocument'>

Using Tagged Types Effectively

When using tagged types, remember that each tag represents a promise about the tagged values. This promise should guarantee more than would already be implied from the value’s untagged type. The type Tagged<string, 'UserId'>, for example, is (presumably) promising that values of that type will be user ids, as opposed to any old strings.

While the Tagged helper makes it fairly easy to introduce tagged types, you’ll have to decide which tags are worth introducing in your particular application, and what their promises mean.

Add tags that would prevent major bugs first. For example, the ValidatedDocument tagged type would likely be worth it, as it could prevent security issues. Similarly, an application that occasionally deals with money might want types like type Cents = Tagged<number, ‘Cents’> and type Dollars = Tagged<number, ‘Dollars’>, to make sure that numbers can’t get passed around without considering their units. However, an application whose whole job is to deal with money would probably forgo those tagged types, in favor of a solution that more-comprehensively handles conversions between units.

The biggest thing to remember is this: if a value’s type has a given tag, then the value must actually uphold the promise made by that tag. For example, if you cast req.body.email as Email without actually validating that the value is an email, you’ve totally defeated the point of the system. You could then pass req.body.email to functions that are relying on receiving an email address, when req.body.email might not be.

Therefore, there should be very few places in your code that produce values with a given tagged type, and those places should only be ones where you’re confident that the promise of the tag is upheld — likely because the code just checked it. The validateDocument function is a good example: it can safely cast to and return a ValidatedDocument because it just did the validation that that type promises.

Lastly, note that tagged types are not a way of hiding a value’s runtime type, which is often a goal of similar features in other languages. For example, if a module’s public API returns a Tagged<string, ‘AccountId’>, that type is a subtype of string; code that gets access to a value of that type can treat it as a string, and the module that returned the account id can’t change its runtime representation (say, to store the account id in a plain object) without potentially breaking this downstream code. That’s the primary difference between tagged types in Typescript and Flow’s opaque types. However, somewhat similar hiding can be achieved in Typescript by defining the type as Tagged<unknown, ‘AccountId’>. Unfortunately, though, code that receives a value of this type can still try to narrow it based on control flow analysis, or pass it to some code that branches on typeof (like a serializer), in which case changing the underlying implementation could still be breaking.

Next Steps with Tagged Types

If you want to push the safety offered by tag types even further, take a look at part two: Tagged Types Improved with Type-Level Metadata. And then seriously, read “Parse, don’t validate”.

Footnotes

[1]: In other languages, similar functionality is called newtypes (e.g., in Haskell, Rust, and Python) or opaque types (e.g., in Flow and Scala). Refinement typing is also related as is, vaguely, units of measure.

[2]: This might change eventually. See, e.g., issue 4895.

[3] Primitives in JS sometimes appear to have properties — that’s why, e.g., someString.replace(...) works — but these properties come via the corresponding prototype, and are the same on all primitives of the same type; individual primitive values can’t actually have their own properties. For example, if you run const x = "hello"; x.newProp = true, you’ll get an error in strict mode. In sloppy mode, you won’t get an error, but, if you try to access x.newProp, it’ll be missing and you’ll get undefined.

[4]: Technically, and related to the footnote above, a string could satisfy this type if the extra __isUrl__ property were added to String.prototype. But, then all strings would have this property, so it would no longer work as a way to distinguish different types of strings from one another.

[5]: There are some downsides to using a symbol, though. Specifically, if your application uses a library to create tagged types, and the symbol is defined as part of that library, and multiple copies of the library end up installed in your project (which often happens due to npm’s package resolution algorithm), then the tagged types created by the different copies of the library won’t interoperate with each other. Nor will utility functions defined by the library that do things like read a given type’s tags.

--

--