Types vs Schema : Data as First Class Citizens

Daniel Tan
GlassBlade
Published in
3 min readJun 6, 2020

“A process cannot be understood by stopping it. Understanding must move with the flow of the process, must join it and flow with it.”

Excerpt From: Frank Herbert. “Dune”.

Hindsight is 20/20, unless used in the present.

A common problem that software engineers often face during development is changing requirements. Something that the product manager would assume to be a small change might turn out to be a hairy mess that the programmers have to defend themselves against. Often this is due to the fact that the programmers made some assumptions about the program that turned out to be different as time passes.

Take this piece of code:

public Car(
wheels,
engine,
seats,
window,
door,
airConditioning,
airBags,
radio
)

This is an good example of a common piece of code we might encounter when we use typed, positional languages.

So what if we want to make a Car without a radio? We set it to null, and hilarity ensues with all the null checks that doesn’t tell you if the radio is null because it was broken, or that it did not come with one.

Types are a man-made construct that we make up to convince ourselves that the real world is not chaotic and ever-changing. Types wrap data to give an illusion of order. When the requirements change, that illusion shatters and conflict ensue between the product owner and the engineering team.

Types miss the whole point of information processing by only telling you what something looks like but never giving you any information of what’s actually inside. Typed languages teach programmers to treat entire pieces of data like black boxes. It’s like using a third party library which is all nice and dandy until one day it has weird behaviours that you’re forced to deal with by looking into the black box. Any successful language that deals primarily with the real world always provides an escape hatch for you to deal with dynamic data.

Schemas, on the other hand, are made to define our assumptions of a part of the underlying data that we’re interested in dealing with. It accepts that data is transparent and that we should not be afraid with dealing with data. We should embrace it and treat it as part of our program instead of trying to shoehorn it into our mental model of what the data should look like.

Take, for example of a file in Extensible Data Notation (EDN) here:

{:type :car
:wheels []
:engine :toyota-engine
:seats :toyota-seats}

This is data. This can be used to fit a specific schema, and it fixes the previous problem with positional arguments because you can use nil to denote that the radio is broken, while keeping not having a radio as, well, not having a radio.

; pseudo clojure code to validate if seats 
; are available, i.e. not null and has key

(defn seats? [data]
(schema/validate data {:seats (comp not nil?)})

As you can see, it provides a much more flexible system for us to test the input and output of the system, which also ensures the correctness of the insides of the data, and only the parts we’re interested in. This is the power of treating data as first-class citizens instead of types.

As such, a lot of languages that interface directly with the real world user input are often also dynamic languages like Python, R, Javascript and Clojure, or languages with supposedly “weaker types” like Java and C# instead of languages with “advanced type systems” like Haskell.

Of course, languages like Haskell can do this as well, for example, Haskell uses Algebraic Data Types to model JSON, but at some point you’re just using trickery to convince the type system that free-form data is an accepted type!

It might be fun for some people to bend their input data to fit the type system but it simply is more productive to bend the schema to fit your input data.

I try to update weekly. This article came up while I was coding in Clojure/Script and wondering about dynamic vs static type systems.

--

--