Sharing data types on a multi-language project

Edaqa Mortoray
Uncountable Engineering
9 min readAug 14, 2023

--

At Uncountable, we use a common client-server architecture of Python on the backend and JavaScript on the front. As the project grew we moved away from dynamic types, introducing MyPy and TypeScript. But we hit a problem: we had to specify our types multiple times. It was both annoying, and led to mismatched types. To solve this, I wrote a tool that would let us share types.

To get both a feel for the tool, and understand the challenges, let’s take a look at a few areas:

  • Basic Syntax: What it looks like and how it works with Python and TypeScript
  • Nulls: How to deal with missing, undefined, and optional values
  • Dates: An example of serializing non-native types in JSON

Basic Syntax: type_spec

The initial goal of type_spec was to write types once and use them in multiple locations. Over time this grew to include self-documenting types, dynamic form generation, and boilerplate reduction. But let’s look at the base case here, to see where it started.

This is an example of some types defined in type_spec. I’m making up types that can hopefully be understood without context. I write the below in a file called product.yaml.

CurrencyValue:
type: Object
properties:
amount:
type: Decimal
currency:
type: String

Widget:
type: Object
properties:
title:
type: String
product_code:
type: Integer
description:
type: String
cost:
type: CurrencyValue
expiration?:
type: Date
feature_tags:
type: List<String>
default: []

I’ve defined two types, where Widget uses a CurrencyValue for its cost property. This example introduces many of the basics of type-spec, from defining types, to using fundamentals and collections, as well as defaults and optional values. I'll get back to these, but let's first look at what type-spec actually does with this definition.

For Python, the tool currently emits the below dataclasses in the product.py file. I say “currently”, as we continue to adapt this to Python’s maturing type system.

@dataclass
class CurrencyValue:
amount: Decimal
currency: str

@dataclass
class Widget:
title: str
product_code: int
description: str
cost: CurrencyValue
expiration: Optional[date] = None
feature_tags: list[str] = field( default_factory=list )

For TypeScript, we get these types emitted to the product.ts file:

interface CurrencyValue {
amount: String
currency: String
}

interface Widget {
title: String
description: String
productCode: number
cost: CurrencyValue
expiration?: String | null
featureTags?: String[]
}

There are a few things that differ from the Python to the TypeScript version.

  • It uses camel-case naming instead of snake-case naming convention, to match each language’s best practice: productCode vs product_code
  • Both Decimal and Date have become String. We need to use substitutes for when TypeScript doesn't have the precise types we want to model.
  • The expiration property became both a ? field, as well as allowing null. This is actually something we don't want, but Python poses a limitation here.

Serialization

We need a bit more than the types to be useful. We need a serializer that converts the dataclasses into JSON for sending to the front-end, and parses incoming JSON back into dataclasses.

type_spec does all serialization work on the Python side. The JSON it emits to the front-end matches the TypeScript interfaces, so the client can use it directly. It is however, limited to the reduced type set, such as a decimal value being represented as a string. Our control library is designed to work with the limited types. For example, our DateSelector takes in and emits an ISO string.

Our serializer also deals with a lot of special cases. In existing code there may be inconsistent names, like a snake_case value reaching the front-end. We also have some implied optionals, defaults, and some Union types expecting strict ordered evaluation. The clean examples I showed above avoid these special cases, but it is important to us that we can migrate to the tool without having to change code. We will eventually change the code, but we'd prefer not to do it all at once.

Optional Values, Question-Mark

We had a bit of an issue with optional values, as we need to represent both a “null” value that does exist, and a value that doesn’t exist. In JavaScript this is simply null and undefined, but Python's dataclasses don't have a concept of "undefined".

First, the easy case, where a value can be null, is defined using an Optional wrapper.

User:
type: Object
properties:
handle:
type: String
name:
type: Optional<String>
email:
type: Optional<String>

In the User type, the name can be a string or a null value. This is mapped to str | None in Python and string | null in TypeScript. This is a decent approach for modeling data. Code that uses the User object doesn't care about the difference between a null or undefined value; it can easily use the above data type.

But API endpoints are different. Consider a list_players API that has several options for filtering the players. We don't want the caller to have to specify optional arguments. That's both inconvenient and not future-compatible: you couldn't add new arguments to the API without changing all call-sites. Therefore, we added TypeScript's question-mark notation to mean a value doesn't have to be specified.

ListUserArguments:
type: Object
properties:
handle_regex?:
type: String
with_name?:
type: Boolean
sort?:
type: Boolean
offset?:
type: Integer
limit?:
type: Integer

The ListUserArguments API can be called with any combination of those arguments, or without any at all. This creates a natural feeling familiar to REST APIs.

Handling these in Python is the same as the “Optional” case. For example, the field handle_regex is emitted as type str | None.

In TypeScript you might expect String | undefined, which is a match to TypeScript's own question-mark notation. But we actually emit String | undefined | null, and it's a bit of a pain point in the system.

For incoming APIs we wouldn’t need the null part, but what about when we send an object from the server back to the front-end. Since Python doesn't support undefined, the field may contain a null in it. And unfortunately, at the time we wrote this our serializer was incapable of removing fields from the output objects, thus sticking null into the output Therefore, we need to support null on the front-end as well.

This is something I want to fix. Making changes to serialization is hard though, as there’s always a chance something depends on some oddly specific behavior. Most places won’t care about null versus undefined, but some do.

Non-extant, missing values

What happens if you have an API that needs to distinguish between a null and undefined value? For example, we could have an API that modifies a user: it can change the user’s name or email. In the below I’ve tried to encode this in two different ways.

ModifyUserArguments:
type: Object
properties:
user_handle:
type: String
name?:
type: String
email:
type: Optional<String>

From the front-end, it’s clear to the caller that if they don’t wish to modify the name, they simply don't put it in the arguments. But what about email, does specifying null mean they don't wish to modify it, or they wish to set the value of null? In Python it gets worse, as we have only None and can't distinguish whether the caller didn't specify a value, or they specified null.

For cases where we need to distinguish between null and undefined, this isn’t going to work. While these are the outlier cases, used in only a fraction of APIs, we do need to support them. For this we introduced a formal extant property.

ModifyUserArguments:
type: Object
properties:
user_handle:
type: String
name:
type: Optional<String>
extant: missing
email:
type: Optional<String>
extant: missing

extant has three values:

  • required: This is the implied value if there is no extant specified and no question mark. It means this property is required.
  • optional: This is specified by the question-mark operator. It means the field may not be specified, in which case Python should interpret this as None, and the front-end should interpret it as null | undefined.
  • missing: The only one we need to specify explicitly as extant: missing. This means the property may not exist and it's important to track the difference between not existing and null.

As I mentioned before, Python’s dataclasses don’t support the concept of undefined values. To support them, I added a placeholder type called MissingSentryType. In Python, the ModifyUserArguments property name gets encoded as name: string | None | MissingSentryType. This gives a unique value that the API endpoint can use to detect if the value is available. It's a bit annoying to work with, especially as MyPy doesn't understand what's going on, but it works, and we see no alternative.

Date and Decimal

Date and Decimal are both types that Python supports but JavaScript does not. JavaScript has a DateTime, but not a plain date. When serializing either of these values for the front-end they end up as strings.

We could parse the strings using a library like Decimal.js or moment on the front-end, but that would require having a parsing layer on the front-end. It’d also require all requests go through that parsing library. We find it easier to leave the strings as-is, safely passing them around, and parse them only in controls that need the semantic information.

Decimal

As a scientific application, precision is important to us. If the user enters a value like 0.123 then we want to preserve that exact value. Python, and our backing database Postgres, are both comfortable working with decimal values and exact representations.

JavaScript however is not; it uses floating point. The JSON data format does support decimals, but when JavaScript parses it, it’ll parse them into floats. Thus we can’t rely on that and need to emit them as strings.

Mostly the front-end doesn’t do operations on the decimal values, so this system works well. It gets a string from the backend, it puts the string, as-is, into an edit box. The user then edits the string and it sends it as-is to the backend. Using a regex our front-end controls can still validate the value is correct without having to convert it to a number.

You might reasonably be wondering what happens if the user formats their decimals, such as 1,000 or 3 234 932.12. Those are both natural ways to enter numbers, and this does indeed complicate things. We have string-based decimal normalization, but that's another story. Suffice to say, only a plain decimal string reaches the API endpoints.

Dates and non-native types

Date is similar: JavaScript does not have a date format. But perhaps more importantly here, when we emit values for the front-end, we don’t expect it to do any parsing at all. For example, even though JavaScript does have a datetime format, we also emit datetime as a string and it remains as a string. The caller of the endpoint has to manually parse that into a DateTime if they want.

We rely entirely on JSON for the front-end representation. Consider this type_spec definition.

AuditLogEntry:
type: Object
properties:
value:
type: Decimal
timestamp:
type: DateTime
clearing_day:
type: Date

While the emitted Python code retains all these high-level types, the emitted TypeScript type uses strings.

interface AuditLogEntry {
value: string
timestamp: string
clearingDay: string
}

In each case we use a normalized string format, such as ISO formats for dates and times. As mentioned before, we then build controls around these string normalized formats. The date controls do end up parsing the values, but most of the UI can safely pass around the string value.

Stricter Types

I am considering introducing stricter types here. Though not officially supported by TypeScript, we can use trickery to build strong string derived types. This would let me define a DateString format. These have the advantage that while they function as a string, a string cannot implicitly convert to them. This would allow me to define specific functions that convert known arguments into the strict format.

I’ve done this in a few places in our code, where it’s very helpful to distinguish between different identifiers. It’s difficult to introduce stricter types into an existing code-base though, thus I’ve been holding off implementing more in type_spec.

And More

That’s a quick look at type_spec, our answer to sharing types between Python and Javascript. We’ve extended the tool a lot. We emit OpenAPI docs from the type-spec form. Our Flask routing tables are generated by it. It supports shared constant values as well as enums — I recently added a “complete” flag to ensure an enum mapping covers all possible keys. We also now use runtime type information in TypeScript to generate forms dynamically.

When we first created type_spec we evaluated some other options. OpenAPI itself was considered — you’ll notice some fragments are reminiscent of that format — but it’s a limited format that isn’t friendly to human developers. Protobuf is another library to handle cross-language serialization, but wouldn’t offer us the flexibility to add all the other features we now have — plus, it was a binary serialization when I last used it.

type_spec continues to evolve as we find more uses for it. Some of these are big features, some of them are plugging gaps in the code. Every step gives us a bit more type safety or reduces our coding effort.

--

--

Edaqa Mortoray
Uncountable Engineering

Stars and salad make me happy. I‘ve been writing, coding, and thinking too much for over 20 years now.