Demystifying Type Systems

Ihor Morenets
Wix Engineering
Published in
12 min readAug 19, 2020
Antique pantheon in Athens, Greece
Photo by Spencer Davis on Unsplash

Type systems nurture the interest of many, but few dig under the surface. You may try hard, but still you will face a lack of reliable and informative sources and an abundance of contradictory statements.

In this article, I’ve compiled facts and myths about type systems, and I encourage you to read further to discover what is what.

We will also cover some neat tricks which can be used to amplify strengths and mitigate weaknesses of different tools.

Prerequisites

Most importantly, what is a type? A common belief (especially in the functional programming world) is that type simply a collection of values.

So a Boolean has two possible values, while an Integer and a String potentially have infinite values.

type Bool = True | False; type Int = … | -1 | 0 | 1 | 2 | … ; type String = “” | “a” | “ab” | …

A class is no different — its set of values is the combination of its own and ancestors’ properties’ value sets:

A type system is what defines and manages types and operations on them. To some extent, it ensures type safety.

Types such as String, Bool and others are connected with operators and functions defined on them

And type safety is the ability of a type system to prevent type errors, which is applying an operation unsupported by a type to its instance.

For example, in some languages, you can perform operations such as accessing non-existent object properties or dividing numbers by strings, which is not type-safe and may lead to bugs:

Every somewhat usable programming language has a type system, even some Assembly versions. To demonstrate the main principles and ideas, I will use a variety of languages:

Logos of languages used in the article for examples

Don’t worry if you are not familiar with some of them, especially with the last three, as these are functional programming ones. I’ll do my best to explain the main ideas.

The last thing worth mentioning before we proceed to the classification is the prevalent myth that:

Type systems can be binary classified

So people say that a language is either statically or dynamically typed, either strong or weak, etc. That’s a very dangerous myth that gives birth to all the holy wars around type systems.

Throughout the article, you will see many examples of this being false.
Generally, modern languages have so many features that it’s tough to classify them with discrete measures — it’s more about what a language supports.

Static vs. Dynamic

This classification is the most popular one and might seem easiest, but it has many misconceptions, for example:

Compiled languages have static type systems and interpreted — dynamic ones

This myth can be dispelled with just two points, which might come as a bit unexpected:

  1. There is no such thing as a static or a dynamic type system.
  2. And there is no such thing as a compiled or an interpreted language.

A programming language is just a specification, a grammar describing syntax bound by semantics and a type system. Any language can be compiled or interpreted.

For example, Haskell and C++ — languages with powerful “static” type systems — have both compilers and interpreters. And JavaScript — originally an interpreted language — is often compiled just-in-time, for instance, with V8’s optimizing compiler TurboFan.

So it’s just a matter of implementation when to check for type errors: statically before the execution or dynamically in the middle of it. Some tools can use type information for optimization, as most compilers do, but it’s not about the language — it’s about which tools you use.

What I’m trying to say is that static type checking is mostly just another static analysis layer that can be added to or removed from any language. For example, you can establish this static type analysis layer with the help of dedicated tools in languages without this layer:

Python + MyPy:

PHP + Hack:

JavaScript + TypeScript/Flow:

But what if we want our existing static type checking tool to give us some freedom? Some languages, such as TypeScript and C#, have this built-in — a type with all possible values:

This way, your type analysis tool (e.g., a compiler) won’t care about how the value is used, but in runtime, the program might crash or produce an error.

You might wonder, “Why would I ever want to do that?” But you’ve probably heard that static type checking makes it easier to maintain a system, while the absence of it speeds up the writing and iteration on ideas.

So, what’s wrong with quickly hacking together some stuff to just see if it works and then add reliable types while it’s still fresh in mind?

Meet Gradual typing!

In languages that support such almighty types, taking the best of both worlds, you can start developing in a dynamic style to iterate and try things quickly and then add all needed static checking later, when the developed module is stable enough:

But be careful with this “later” — make sure it’s right after you’ve settled on an implementation and not just “later,” probably never.

Speaking of the advantages of having only dynamic type checks, here is another one — the ease of writing generic functions. You can express anything with no static type checking holding you back. This doesn’t necessarily mean that this “anything” will even work, though.

Look at this JavaScript example:

We can use any value we like here, as long as it satisfies certain requirements, specified implicitly as property accesses and method calls, or get a run-time error.

This mechanism is called Duck typing, and it applies the Duck test:

Yellow rubber duck

If it looks like a duck,
swims like a duck,
and quacks like a duck,
then it probably is a duck.

In our case, if an argument has the needed properties, it fits, and this is only checked at runtime. So if we never access the missing properties we don’t have to implement them:

To achieve a similar result in a type-safe way, some languages, such as C#, have generics:

They apply the Duck test as well, but in a bit differently — generics demand all potentially reachable members to be defined.

You can see that we also have to specify a bounding type IMaybeLong for our generic type T. An object passed to the function has to be of a class implementing this interface. So overall, generics are safer than duck typing.

There is also something in between in C++ — templates:

You don’t have to specify a bounding type, but a type checker would still yell at you if a passed type didn’t have all potentially reachable members. Their idea is to gather all the possible types a template can get and then generate an according function or a class for each type. So in this example, we would get two different functions.

I won’t say that you can express anything with generics or templates, but if you really need to implement something and just can’t, you might be doing something too complicated or are not using the best tool for your problem.

Power

Actually, the ability of a language or a type system to express complex concepts and patterns is called power or expressiveness, and different languages have different power levels.

For example, in Golang, a rather popular modern language, there are no generics, which are present in many mainstream languages such as Java, Swift, and others.

Whereas TypeScript has these and also supports union and intersection types, which are very powerful concepts:

Union means that its instance is either of the options, so either a string or a number in this example.

Intersection says that an instance has to have properties of both types — IWalking and IFlying.

Besides, there are even more powerful but less known languages, such as Idris. It has types as first-class objects so that they can be used as any other values, and also support Dependent types, where a type is dependent on its value:

Let me take you through this example.

Function isEmpty, returns whether a list is empty (duh!) and function head returns a list’s first element, but only if it is not empty. The part in curly braces is a predicate, and we statically require it to be false. So a list passed to this function has to be checked for being not empty.

The most important part is that it can be enforced statically, before the execution. And even when the value comes from a dynamic context, such as user input, it has to be checked with isEmpty before being passed to this function.

Explicit vs. Implicit

Another important yet simple classification which people often overlook is explicit and implicit type systems.

Explicit means that a developer has to specify the types of variables, arguments, return values, etc.

Implicit means that you don’t have to or even can’t specify the types, and the type system will infer them for you.

There is a widespread myth around this:

To have your code statically type-checked, you have to specify the types

First of all, even if a language doesn’t imply using a static type analysis tool by design, it still can be checked with no changes if it’s possible to infer the types.

For example, if your JavaScript uses a browser API with defined types, it’s easy to infer the types at least for some code, which is a common feature in modern editors:

And more sophisticated type systems and tools such as the Haskell ones allow whole functions to be implicitly typed, as long as their type resolution is not ambiguous:

Our sum function was inferred to take two Numbers and return a Number. It also works quite well with custom types, such as Language, because their constructors have to be unique.

It’s less code for sure, but doing it this way has a few disadvantages:

  • You have to read the whole function to understand what it does, instead of just looking at a signature.
  • The type inference gives us the most generic type, so if we wanted our sum function to operate only on Integers and not also on Floats (which Number includes), we would have to specify the function types ourselves.

Anyway, type inference is supported in most modern languages, and even relatively old ones such as C#, Java, and C++ got support for it.

But how do we get the most out of it?

I like and recommend to specify the types for:

  • Functions and methods arguments;
  • Functions and methods return values;
  • Classes properties.

The types of simple variables can be easily inferred because they are always assigned a value of a specific type, be it a literal or a function call result. This approach allows both to keep code concise and easily statically type-checked:

Some people also like not to specify their return types, which is even more concise, but it might result in a nasty bug in a completely different part of the codebase.

Let’s remove the return type of afterTax. Then, say, refactoring happened, and we’ve accidentally removed the parsing from the function, but now we have no explicit type to guard us against this error:

As you can see, this is especially dangerous if implicit type conversions are allowed in the language.

Weak vs. Strong

Speaking of the implicit conversions, it’s time to dive into the type system’s strength

And this notation is a complete mess!

Various sources give entirely different definitions of a strong and weak language. Some talk about type conversions, others about memory safety, and some even mistake it for the static and dynamic type checking. And, let’s be honest, who wants to use a weak language or a type system?

Of course, no one, and in some more progressive articles and papers, you can find a plea not to use this classification.

Norman Ramsey a  Stack Overflow expert, says “Every time I see a question about ‘strong’ and ‘weak’ typing I kill a kitten”

Shortly I will show you why it’s so bad, while also breaking a few common myths.

But first, let me introduce a better notation — type-safety, which we’ve already seen at the beginning of the article. Implicit conversions are not considered type-safe, so you can say that a specific language feature or an implementation is more type-safe than the other.

With this in mind, we could say that, for example, these languages have few implicit conversions:

  • C++
  • C#
  • Haskell

While these have many:

  • JavaScript
  • PHP

Can’t we just say that the first camp is type-safe and the second is not?

Remember the very first myth — nothing is binary in this topic. Our presumably safe languages are very different.

C++ has many conversions between primitive types, and also implicit constructors and conversion operators:

C# allows casting objects, as well as many other mainstream languages:

Haskell doesn’t have any of these, but at least it allows treating an Integer as a Float:

But OCaml doesn’t allow mixing integers and floating-point numbers. The operator + is defined only for integers, so you have to use a special operator +. for floats or explicitly convert between them:

The somewhat unsafe languages are different as well, for example, in handling operations on strings and numbers:

PHP demonstrates a less type-safe behavior allowing to divide a number by a part of a string, while JavaScript requires the whole string to be parsable.

But in another example, it’s the opposite because PHP warns about indices out of bounds and undefined property accesses, and JavaScript just returns undefined:

All you can do here is just say that a language is mostly type-safe and then enumerate all its quirks you have in mind to explain what you mean. So, please, only use the type-safety notation when discussing some specific parts of a language or it’s usage. And never use the strength notation.

The vaguer the topic — the more there are fallacies. A common one is that:

Languages that are statically-checked are more type-safe than the ones with only dynamic checks

But it’s not true because these classes are not connected in any way, and the best mythbusters here are C++ and Python.

The former has some fantastic compilers which have required static type checks, but we know already how unsafe it can be. And Python is frequently only dynamically type-checked, yet it has strict rules on type mismatches and is acknowledged as a rather type-safe language:

Nominal vs. Structural

The last major classes of type systems I’d like to cover today are the nominal and structural ones.

In short, if the types are considered equivalent just by having the same subset of values (their structure is equal), it’s structural typing.

If a type can be equivalent only to itself and its descendants — this is nominal or nominative typing:

Imagine a typical situation in a mainstream language, such as C# — you have two types with exactly the same properties and methods, thus structurally equivalent:

Also, you have a function operating on one of them, and you want to reuse it for the second type, but it’s not so easy:

There are numerous solutions to this problem: inheritance, overloading, etc.

However, we face this issue because C# supports only nominal typing, and this wouldn’t be a problem in a language supporting structural typing, like TypeScript. Because our types are structurally equivalent, they can be used interchangeably:

Structural typing in some form is also supported in Go, Scala, and OCaml, to name a few.

However, sometimes you really want your type not to be equivalent to anything else. For example, when working with user input, you might want to sanitize it. To use your types effectively, you can create different types for sanitized and non-sanitized input:

Then we could create a function pipeline requiring the input to be sanitized before persisting it:

Alas, this isn’t so simple with structural typing. The types are still structurally equivalent and can be used interchangeably:

We can emulate nominal typing by adding a tag to distinguish them:

We used TypeScript literal types, which mean that the property can have only this value. Now our pipeline works as expected and we can’t mismatch the input types.

Summary

Wrapping up, I present you the final myth:

Every type system has its place and uses

You’ve probably noticed that we’ve talked about both the strengths and weaknesses of different type systems and paradigms. That’s because I believe that all the type systems, languages, and other tools we’ve covered today have their design goals and purpose.

For instance:

  • C++ was designed for incredible performance, sacrificing many useful type checks;
  • Go was supposed to be simple and fast, not overloaded with features;
  • OCaml’s strictness might seem tedious, but it sure prevents a lot of bugs.

That’s why I propose not to argue about which type system is better, but be more pragmatic: pick a tool that makes your life easier and your work more productive and use it to its maximum.

For example, when working on a proof of concept, you might want to neglect the type-safety in favor of development speed. But, when working on a long-term project, static type analysis and overall type-safety can save you from many bugs and much pain.

With that said, I assume the final myth Confirmed.

Thank you for reading, I hope this post was interesting and insightful to you!
Share your thoughts and questions in the comments or contact me directly by mail, Twitter, or any other means.
Also, follow me on Medium and GitHub for more engineering insights.

--

--