Native objects versus transport objects

Published in

Tock

7 min readNov 26, 2019

[A few months ago, I had a discussion with a coworker about a problem they were having with Tock frontend Javascript code representing objects it received in the backend. Inspired by that discussion, I wrote up this document for the Tock engineering team. Since there’s nothing especially Tock-specific about it, I thought I’d share it more widely as well.]

TL;DR

Transport objects are good for representing data in transport, but generally, aren’t very good at representing native program data. For objects where the value of a native representation is high, it makes sense to define a separate native object and an explicit translation function from the transport object to the native object.

The problem

A normal thing to want to do in a program is to read a value from some external source such as an external server or from disk, and then work with it as a normal program value. For instance, at Tock, we talk a lot about bookings, and it makes sense for the Javascript client to fetch bookings from a server and also do lots of local manipulation.

Any time we want to send data over the wire, we have to put it into some kind of transport format; that is, we’ve got to have some scheme for representing the object we want to send as a (preferably short) self-contained bit sequence — e.g. a defined JSON or S-expression format, XML, whatever. At Tock, we use protocol buffers.

Along with that wire format, we also need some way to access those bits programmatically in every program that’s going to read them. The standard solution to this problem, and the one used by protobuf, is to generate code in any language you want that reads the bit sequence into what I’ll call a transport object, that is, a normal value in your programming language that reflects one-for-one the data that was sent over the wire.

At some point, though, we’re going to want to use the information contained in that transport object for some core business purpose. For instance, at Tock we send bookings over the wire from the server to the client, which then wants to interpret them, manipulate them, make decisions based on them, etc. In other words, we want what I’ll call a native object; that is, whatever value we use within the parts of our program that aren’t concerned transporting data.

So how should we get a native object?

The most obvious answer is to just use the transport object directly. For instance at Tock the only value we really use to represent a booking is the transport object itself. This is a tempting option because the protocol buffer compiler has already generated all the code for you so there’s no extra work involved, and your transport object often looks similar to the object you’d write by hand. Much of the time this choice is fine.

But it’s not fine all the time. This isn’t just because sometimes people define proto buffers incorrectly or whatever. There’s a fundamental tension between what makes a good transport object and what makes a good native object, and that tension means that using a transport object as a native object is sometimes a bad choice.

What makes a good transport object?

The goal of any transport object — JSON, XML, proto, whatever — is to faithfully represent the data that was actually transported while exposing its underlying structure as naturally as possible in the host language. To do that, the transport object needs to do a few things.

First, it needs to have a good representation of every concept that the format exposes to the host language. For instance protocol buffers expose the concept of records with multiple fields, so we’d expect a good, native representation of that concept in every language it supports (e.g. as objects in Javascript and Java). In contrast, since protos don’t expose the concept of unordered sets, it doesn’t have any good representation for them. Similarly, the transport format itself (as opposed to the transport object that represents it in some language) should only let you define constructs that are feasible to map into a wide variety of languages.

Second, it needs to reflect the fact that the transport format can evolve over time. If the definition of a message can change, then in any system where multiple programs talk to each other — either because they’re running on different nodes in the same network, or even just a single program that reads data that it has saved in a previous run — you might be in a situation where one of them knows about the new version and one only knows about the old version.

Of course a transport object isn’t going to magically make its surrounding program understand and use new fields in a good way, but it would be good from a deployment perspective if at least:

An old client could parse a new message, leaving out anything it didn’t understand rather than crashing
A new client could parse an old message, clearly indicating a missing field rather than crashing

Putting these two requirements together, transport objects tend to:

Be simple tree structures — for instance, protos and JSON both basically only allow strings, numbers, and booleans as base values and only associations and lists to build larger messages from smaller ones.
Have what I call an “over-time” view of the world — a field is optional if future or past versions of the message being sent might not have that field, even if that field will always be present in the system as it exists today.
Grow additively — it’s easier to add new information to an existing message than to restructure it, messages tend to evolve by having new fields added on rather than having all their fields refactored. This allows for a smoother upgrade path.

What makes a good native object?

Native objects have very different constraints. A native object, since it isn’t involved in transport, doesn’t need to be designed around transport problems. For instance:

There’s no need to restrict yourself to concepts that map into many languages. It’s much more important to select a representation that fully captures the domain concept you’re trying to represent and to be idiomatic to the language you’re working with.
You only need to support your actual program right now — you’re free to structure the value in a way that suits your program’s particular needs rather than trying to be a generically-useful value.
You can have a point-in-time view of the world — you never need to worry about your code today working with code from yesterday that wasn’t recompiled, or code tomorrow that hasn’t been written yet.
Similarly, you can refactor as much as you want — you’re going to be deploying your entire code change as a single unit, so you can change it as much as you want as long as you make sure your program still works after the change.

Some specific things you might want for native objects that don’t make sense for transport objects:

Having a type declaration that captures precisely what values make sense in your domain and that makes impossible states unrepresentable
Using appropriate data structures, such as maps and sets, where appropriate
Having domain-appropriate methods

These aren’t the same!

The things that make sense for transport objects don’t make sense for native objects and vice versa. So how should we deal with that?

Option 1: Just use the transport object and deal with it

This is a pretty standard choice.

Advantages:

No extra code to write upfront! You needed a transport object anyway.

Disadvantages:

The transport object’s types won’t capture your point-in-time understanding of the domain object. The most obvious example of this is that the types will think of everything as optional even though that’s not probably what you want.
Related to that, but more subtle: You end up scattering knowledge and assumptions about your transport format all over your program. To understand why, suppose your program uses an optional phone number, which your message represents as two fields: area code and number.

Logically, if area code is set then number must also be set and vice versa. But the transport object won’t guarantee that. So what you’re likely to do is, at the point where you actually read the phone number, check that both fields exist and signal an error otherwise. But what if there are two places where you read phone numbers in the code? That logic is likely to get duplicated. Even worse, imagine that the program originally only had one field for number; the area code field was added later and isn’t always set. Now everywhere that accesses phone numbers needs to know this rule! You could write a library to capture the logic, but everyone still has to know that they need to call that library.

Option 2: Map from transport object to native object

In this option, you write a separate, native object to represent your domain concept, and you write an explicit translation function from the transport object to the native one (and vice versa if your program needs to send values as well as receiving them). In your program as a whole, only use the native object except in the parts of the code that explicitly deal with transport.

Advantages:

Your main code can use objects that you control entirely.
All the domain knowledge of how to understand transport objects is encapsulated in the translation function; you never have to think about it anywhere else. For instance, in the phone number example, the code that figures out what fields to read is always written exactly once, in a single easy-to-find place — the decoder function — and all other code can rely on a phone number without having to know how anything about its proto representation.

Disadvantages:

It’s not free! You have to write a bunch of code.
Many concepts will map directly, which makes this code tedious to write.

Recommendations

In my opinion, the right thing to do depends on the complexity of the message. For messages that require a lot of interpretation or that represent especially complicated domain ideas, you’re likely to get a lot of benefit from a single translation function that knows how to interpret messages and a native object that has a proper type, so you’re better off doing the mapping. For smaller, simpler messages, mapping is probably overkill and you’re better off just using the transport object directly.

Native objects versus transport objects

TL;DR

The problem

What makes a good transport object?

What makes a good native object?

These aren’t the same!

Option 1: Just use the transport object and deal with it

Option 2: Map from transport object to native object

Recommendations

Written by Jacob Matthews