Type inference for beginners — Part 2 (Polymorphism)

In the last part, we implemented type inference for simply typed lambda calculus. In this part, we’ll extend our type system with parametric polymorphism or generics.

Note: There were some mistakes in the last part so check out the edit notes at the bottom.

Motivation

There are some functions that need their argument to be of a specific type. For example, a function “add” can add two Int values. It wouldn’t make sense to add two Bool values. However, there can be some functions which don’t care about the actual type of their argument. Their body may not enforce any constraints on the type of their arguments. They are known as polymorphic functions. The simplest example of a polymorphic function is the identity function which simply returns its argument as it is.

  • (x) => x

The so, this function should work for Int, Bool, or any other type. Its return type is the same as the type of its argument. In our current system, such a function is not expressible as we need to assign a concrete type to the function.

Universal quantification

Polymorphic types are expressible through the means of universal quantification. The type of our identity function from the previous section can be written as

  • forall A. A -> A

This says that for all types “A”, the type of this function is A -> A. Such types, with forall quantifiers (also called universal quantifiers, universal because they quantify over all types) may be “instantiated” when they are used with a concrete type. Type instantiation is similar to calling a function. Just like when you call a function (x) => x with 1, we replace the instances of x in the body with 1 and remove the parameter, instantiating a quantified type is equivalent to replacing the instances of the “type parameter” in its body with the “type argument” and removing the forall clause. For example, if you call identity with an Int argument, all instances of “A” should be replaced with “Int”, giving the type

  • Int -> Int

Same for other types.

Quantified types are sometimes called type schemes.

There are other kinds of quantification as well like bounded quantification and existential quantification. Bounded quantification is useful when we need to apply other constraints on the generic type variables. This is commonly seen in interface types. We’ll look at bounded quantification later when we implement interfaces.

Where to allow quantification?

If you have problems understanding this section, don’t worry. Just skim through it and move on.

The language designer can choose where to allow quantifiers in the type system. This can range from highly restricted, only allowing quantifiers at the top level, to completely unrestricted, allowing quantifiers to appear anywhere a type can. The depth at which a quantifier can appear is commonly known as “Rank” of the type. For example, under Rank 0 types are simply non polymorphic types we implemented in the last part. Rank 1 types mean a quantifier can only appear at the top level. So, under Rank 1 restriction, the following type is valid

  • forall A. A -> A

Whereas this is invalid

  • forall A. A -> (forall B. B -> B)

In this specific case, we can simply move the inner quantifier to the top level without changing the type so, in Haskell, this actually counts as a Rank-1 class as its equivalent to

  • forall A, B. A -> B

Moving the quantifier outside is not possible in all the cases, for example,

  • forall B. (forall A. A) -> B

Can’t be converted to Rank-1 so to represent this type in Haskell, you would need to turn on Rank-2 or Rank-N compiler extension.

Classical Hindley-Milner type system allows Rank 1 types. This is because of the fact that in an arbitrary rank system, type inference is undecidable, i.e. it is impossible to infer types of some well typed expressions without some type annotations. Also, arbitrary rank type inference is harder to implement.

For now, we’ll implement Rank-1 types where the forall quantifier can appear only on certain places.

Let expressions

To ease implementation, we won’t allow function arguments to have polymorphic types. We’ll add another type of binding which allows polymorphic types to be assigned to variables. These are let bindings. This type of polymorphism restricted to let bindings is also known as “let-polymorphism”.

A simple let expression can be

  • let x = 3 in plus(x)(1)

Here, variable x is being bound to the value 3 in the body (plus(x)(1)) of the let. Don’t confuse this with let/var statements in other languages. This is an expression which evaluates its body. This expression will return 4 when evaluated. In statement oriented languages, let statements don’t have a body. They just bind a variable to an expression’s value from the next statement to the rest of the block.

So, let expressions will be used to assign polymorphic types to variables. For example, our identity function can be defined and used as

  • let id = ((x) => x) in id(1)

The id binding will be visible in the body of the let.

Type checking let bindings

Let’s go through the process of type checking this snippet to informally to get an intuition for the process.

  • let id = ((x) => x) in id(1)

We start with the let declaration. We start by taking the RHS of the assignment ((x) => x) and calling infer on it. As we know from the previous part, the inferred type of the RHS would be (T0 -> T0) where T0 is a newly generated type variable.

This will take us back to the call to infer for the let expression. We have the type of the RHS of the let binding (T0 -> T0) and our environment is empty.

Now, we can see intuitively that (T0 -> T0) should be polymorphic. So we can generalise it to a polymorphic type (forall T0. T0 -> T0).

Notice that we can’t just quantify over all the type variables. If a variable occurs in free the environment, that means it could be bound outside the scope of the current let binding. Occurring free means not bound by a quantifier. For example, A is free in the type (A -> A), but it’s not free in (forall A. A -> A).

So we have to check the environment for free occurrences of the type variables in the type we’re generalising. If they don’t have free occurrences in the environment, we can quantify over them. In this case, the environment was empty so, we can generalise over T0.

So, back to the inference. Now, we have found the most general type for the RHS of the let, we can add a binding to the environment (id: forall T0. T0 -> T0).

Now, we can infer the type of the let expression’s body (id(1)). We start by trying to infer the type of variable “id”. We look it up in the environment. We find (forall T0. T0 -> T0). Now, infer function needs to return a regular type. We have a polymorphic type. So we need to “instantiate” the polymorphic type by removing the quantifier. We can do this by taking the quantifier’s name, generating a new type variable for it and replacing its instances in the quantified type with the new generated type variable. So, this would give us (T1 -> T1). We still don’t know what T1 actually is. We created a new type variable because once we infer the actual type of T0 (Int in this case), we don’t want T0 to be Int right? Otherwise, other calls to id will require their argument to be of type Int. So we generate a new variable. Anyway, call to infer on the variable id returns (T1 -> T1). Now, we infer the type of the argument which will give us Int along with the substitution (T1: Int) as usual. So, the type of (id(1)) will be Int, and the type of the entire let expression will be the same.

That was a lot of hand waving, so lets look at the actual code.

The code

First, we add a new type for let expressions.

A let node has a name, RHS expression, which is bound to the name, and a body.

Next, we define the type for Forall quantified types.

It can quantify over multiple type variables (quantifiers).

Note that we don’t extend our Type union to include the Forall type. This is because we want to restrict the places where a forall can occur. It shouldn’t be allowed at all places where a Type can occur.

The environment can now contain types as well as Forall. So we update the Env type to reflect that.

We need to add a function to apply a substitution to a Forall type.

As you can see, we don’t replace the types that are quantified over by the forall, because the forall bindings “shadow” the previous bindings with the same name. This is similar to how function bindings work. A the function parameter variable hides variables with the same name from outer scopes. So we clone the given substitution and delete the quantified names from it, then apply the substitution to the type of the forall. Note the use of object spread syntax here. { …subst } is equivalent to writing Object.assign({}, subst).

We need to change the signature of our addToContext function to accept Forall along with Type.

Next, we need to add some functions for getting free type variables from different types. Remember free vars are the type variables that aren’t bound by a quantifier. So, for Forall types, free variables are the free variables of its body minus the quantifiers. Here are those functions.

Instantiation

We need a function to instantiate Forall types. As I said earlier, it involves generating new type variables for each quantified variable and substituting them in the body of forall and returning it.

Inference for vars

Now, since our environment can contain forall as well as regular types, we need to update our inference for variable expressions. If the variable name is bound to a type, then it is the same as before, however, if its a Forall, we need to instantiate it and then return it.

Note that I’ve separated out a new function inferVar that handles inference for vars from the main infer function. The same for other node types. The switch case in infer just matches over the node type and calls the appropriate specific infer function (inferVar, inferCall, etc). Just to make the code a little bit cleaner.

Generalisation

Let bindings can bind polymorphic values so we can generalise the RHS type of the let binding to a forall type.

Here’s the generalize function

It takes the free vars in the given type and puts them to the left hand side of a forall. Note that we first check if the type variable is not free in the environment. We can’t generalize free vars in the environment because they may be bound to specific types later.

Infering Let expressions

Now we move on to inference for let expressions.

So we infer the RHS of let, generalize the type and add the generalized type to the context environment. Then infer the type of the let body under the new environment. Then we return the body’s inferred type and the composed substitutions from previous steps.

Now, we can check our inference for polymorphic types.

Here are a few helper functions.

And here are some tests.

Conclusion

In this part, we added a form polymorphism (let-polymorphism) to our language. This makes our type system a lot more expressive. Again, play around with it. Here’s a very simple exercise. Add a `throw` function to the initial environment. It can take one parameter of any type and returns a value that can be used in any expression regardless of what type of expression is expected there. Think of the type of throw. Hint: It’s polymorphic obviously.

In the next part, we’ll continue our journey further and add records to our language. They will allow us to represent key value pairs in a concise manner. They will be statically typed of course. We will also have a discussion about nominal and structural typing.

That’s all for now folks. Have a nice day.

Like what you read? Give Dhruv Rajvanshi a round of applause.

From a quick cheer to a standing ovation, clap to show how much you enjoyed this story.