Sequences and Collections in Swift are an incredibly powerful way to build complex functionality from composable, functional pieces. I try to use them as much as I can; but they do have an interesting limitation I ran into when building machine learning algorithms in Swift.
Here’s some code:
The last line doesn’t compile. Swift complains that “value of type ‘[Int]’ has no member ‘addToStart” . But I have a LinkedList, so where did that [Int] (array of Integers) come from?
What’s happening is that that map (and filter, and others in the standard Sequence definition) return Arrays. They always return arrays: no matter what collection type (like LinkedList) I make my Sequence out of, I will always get back an Array when I carry out any of these “sequence”-type operations. That sounds… odd. Was this a mistake in the standard library?
What’s going on?
I’m going to explore the answer to this question, and show a way to actually implement what I am trying to do in a highly generic, and extremely powerful, way.
But the answer itself is surprisingly deep, so I will follow up with a second blog post on some of the fascinating mathematics behind the implementation, and look how we can go further than this problem to build even more powerful and generic tools.
In this post
What we’ll do is superpower our Sequences with an extension which allows us to extend the sequence operations like map, filter and friends so they can apply across any type, including the built-in ones. These will really come into their own when combined with the tools in the next post.
Using those, we can write things like the below, where the logic is the same for each sort of container (Array, Linked List and Tree) :
Take a look at the Playground, where we add a few more functions and operators and re-write the above in the continuation-passing style like below, which I think looks really nice! (There are a few other experiments and challenges on the Playground, so try it out)
There’s also a github project which is essentially the same as the Playground but more suited to seeing how the code works ‘in practice’.
The Sequence Protocol
First though, let’s look at why Sequence is returning an Array for my LinkedList in the first place. After all, I can write my own lmap function in LinkedList which does return a linked list :
So can I write a generic version for Sequence? For reasons which will become clear, I’m going to put it in a protocol Functor with a new function fmap:
But.. what goes in the ??? I’d like to be able to say something like “The original kind of sequence, like Array or LinkedList, but now containing type S rather than type Element”. Maybe Sequence<S> would do that?
Unfortunately: it doesn’t ! For starters, protocols can’t take type parameters (<S>), so there’s no way of telling Swift that I want a sequence of a particular type S.
Secondly, actually Sequence isn’t quite right here— I don’t just want any old sequence, I want the specific kind of sequence we started with (Array, LinkedList, etc). So maybe something like Self<S> … but again, that doesn’t work either.
This is the reason why the standard library always returns Arrays for maps (and filters, and flatmaps, and others) : there’s no way to express in Swift that a function fmap returns the right kind of sequence (like LinkedList) containing the right type of element (like S).
Is this the end of the story ? Maybe I simply can’t create my fmap function!
No it’s not. Let’s go a bit deeper.
Higher Kinded Types
A few programming languages, including Scala and Haskell, allow for the creation of what is known as Higher Kinded Types (HKTs). These would allow me to write something like (in made-up Swift syntax)
In other words, to actually do what I want and return “the original kind of sequence, like Array or LinkedList, but now containing type S rather than type Element”.
Higher kinded types allow us to generalize over type constructors. This isn’t possible in Swift: see the code below, where we have a generic type called AClass with a type parameter A. We can certainly create another type called GenericType which uses AClass within it, and pass in A as a type parameter to use in AClass’ type constructor. But we can’t decide that instead of hard-coding AClass, also pass that in as a generic type:
Higher Kinded Types in Swift
So, Swift doesn’t support Higher Kinded Types natively. But a recent eye-opening GitHub project by Matthew Johnson showed how, with careful use of some boilerplate code, the existing powerful type and protocol extension features of Swift can be extended to mimic higher kinded type functionality.
I am not going to go into the specifics of the implementation in this post, but basically it uses 3 concepts. It’s not necessary to understand them in detail, but we will see them referred to below:
- The Constructible protocol, which are those whose type constructors we want to use generically.
- A Tag for each constructible type¹: a ‘shadow type’ which Swift uses to actually store the instance being created. ArrayTag means Swift can create Arrays for higher-kinded types, LinkedListTag creates LinkedLists, etc.
- A struct called Construct<C, T>. This means “construct a value of generic type C, with a type parameter of T.”, like writing C<T>() in Swift.
With this generic type, I can rewrite my fmap protocol requirement like the below.
You’ll see I now have a FunctorTag because I want my “functor” to work over constructible types (ie, my Functor is a higher-kinded type)
I also added a typealias F and made fmap static, which helps to illuminate what fmap is actually doing:
- if we have a transform from A’s to B’s,
- and we have a “functor” F with a type parameter <A> (like Array<A> or LinkedList<A>), written Construct<F,A> above,
- then fmap is converting those F<A> into F<B> (written Construct<F,B>)
That’s similar to what regular map does, but map for any “sequence” S<A> will always return Array<B>, not S<B>.
If I write these out in a “pretend” higher-kinded-type-style that we might (one day?) get in Swift, by removing the Construct<..>, it’s easier to compare the two:
So FunctorTag is a really a way of representing a type which supports a more more generalized map, and in fact for our purposes that’s what a Functor means : a “type which can be mapped over [, and returns the same type]”. A Functor is a “higher kinded type” as it allows us to generalize over the F<A> type constructor. We’ll talk more about Functors in the next post.
Using our Functor
So, how do we actually use our new FunctorTag?
We want to minimize the boilerplate — those Constructible types, Construct<C,T> structs and Tags look very heavy!
At this point I turned to a fantastic tool called Sourcery by Krzysztof Zabłocki, which automates the creation of boilerplate code. If you look at the Xcode project file on GitHub, you’ll see how I actually set it up; and it makes a huge difference from a usability perspective.
The Constructible Protocol
Thanks to the power of Sourcery and Swift’s protocol conformance capabilities, in order to turn a generic type² into a Higher Kinded Type, just make it conform to the Constructible protocol. That will ask you to implement a single item: a typealias for the type variable used by the protocol³.
Constructible means we can use these as type constructors with higher kinded types like Functor. With Sourcery, adding that protocol conformance will automagically create a number of things for you:
- Tag types for use with Construct;
- Conformance to the internal _TypeConstructor protocol;
- toType and Type.lower methods to get the original type back.
- a ‘continuation-passing-style’ >>>¬ operator which also lowers back to the original type, we’ll see more of this style in the next post (¬ is ALT-L)
There’s also a postfix operator ^ (pronounced ‘lift’) which you can put after a value of your constructible type, to turn it into a Construct type. For example:
Being able to lift and lower types back to where they came from is all very well, but not particularly useful! Let’s get back to the point of this: implementing the FunctorTag protocol. Recall what that looked like:
So, let’s make Array into a functor over which we can use fmap.
To do that in this framework, we actually need to make ArrayTag (which Sourcery creates for us if we make Array conform to Constructible) conform to FunctorTag. We can do this by using the regular map function in Array and lifting and lowering:
Let’s do the same for the LinkedList type, using the lmap function we defined just for LinkedLists :
And finally, if I have a Tree type with a similar tmap function:
It won’t escape your notice that the implementation of fmap looks very similar in all cases, but that’s because I’m using map, lmap and tmap helper functions, in fact even with the help of Sourcery we can’t create a ‘generic’ implementation here; nor could you create a protocol like Functor which asks you to create fmap implementations for your own types. (Which is really the point of this whole exercise)
What did we achieve?
We got to this point:
fmap is a protocol function which means it can be used with any “higher kinded” type implementing FunctorTag. FunctorTag represents something that doesn't exist in Swift: “types which can be mapped over and return the same type”. So all we need to know is that we have a type that can be lifted (^) to FunctorTag and we know we can use fmap on it: we don’t need to know the details of the type itself (Array, LinkedList, etc).
This is incredibly powerful in itself, but the higher-kinded-type concepts we have looked at here open up a huge number of additional capabilities.
In the next post, we’ll expand on what we have seen here, and look at providing another higher-kinded type in addition to Functors — Applicatives.
¹The Tag is actually the secret sauce that allows this to work — it internally type-erases (using the Any type) the original TypeParameter passed in, and allows that Any type to be converted back to the original TypeParameter without ever allowing the outside world to know about the type erasure. I’ll be fairly loose in this blog about the difference between the original constructible type C and the tagged type CTag, because we can lift and lower between them, but internally it is very important!
²Only higher-kinded types with single parameters C₁<A> are supported; multiple parameters like C₂<A,B> etc are not. That’s not a technical limitation, but the sourcery scripts and the Construct type would need to be extended to accommodate.
³Sourcery currently cannot auto-discover generic type variables.