Machine Learning in Swift : The Matrix

9 min readOct 1, 2017

Inspired by Apple’s Core ML and Michael Nielsen’s fantastic online book on machine learning, I thought I’d try and understand the basics of the algorithms, and explore Swift 4, by rewriting the Python code from Nielsen’s book into Swift.

We’ll tackle the machine learning algorithms in a further blog, but the most immediate problem I ran into was matrix manipulation in Swift. Python has the numpy library; and although there are some good versions of this for Swift on GitHub, nothing quite met what I wanted. Specifically, I was looking for something expressive, functional (in the best sense of the word) — and highly performant. However, as I have other ideas for where I want to use this library, I also want something extremely generic and reusable (in the best sense of the word).

The Journey 🚲

This post aims to show how I wrote a highly generic and well-performing matrix maths library, which is also expressive and clear to use in complex algorithms. We’ll start with high-school maths, dip into algebra and group theory, tour via Swift’s generic programming and custom operators, and end up with a matrix-maths domain-specific language using the high-speed BLAS and Accelerate libraries.

At the end we’ll be able to write something like

over large matrices of Doubles and get a fast calculation; but also be able to run that same calculation on more exotic mathematical objects. Swift supports this kind of code extremely well : it would be very difficult to achieve in other programming languages.

Starting at the very beginning 🛴

What is a matrix anyway? At school we learn it is a grid of numbers with a fixed number of rows and columns.

There are various operations you can do on these and you may recall the tricky matrix multiplication operation (the “dot product”).

It is their ability for matrices to carry out several operations at the same time which makes matrices very powerful in a number of areas like 3D graphics (generally 3x3 or 4x4 matrices representing spatial manipulations) or machine learning (generally very large matrices representing neural networks)

Matrices and Algebra 💍

Matrices can be defined more generically than just on numbers. If you think about it (and perhaps work through a few examples) the main thing that you need to have in order to be able to operate on a matrix is a type which allows you to “add” and “multiply” things of that type together.

Numbers (integers, floating point numbers like Doubles) all obviously fall into that category. But there are other objects on which we can define “add” and “multiply”-type operations— Booleans (TRUE/FALSE) for example, or even more exotic objects.

For example, for Booleans, you can define “add” to be the “OR” operator; and “multiply” to be the “AND” operator, so we get something like:

(b₁＊b₂) + b₃ ≣ (b₁ AND b₂) OR b₃

so with b₁ = TRUE, b₂ = FALSE and b₃ = TRUE we get the answer ‘TRUE’. We can use Boolean matrices to store “reachability” relationships for example: put TRUE at row i, column j if there is direct route from place i to place j .

In fact, the exact operations that a type T needs to be used with a matrix are those defined by a mathematical object called a Semiring. These have addition and multiplication operations (with features like associativity, distributivity and commutivity that work in familiar ways); together with a zero object and a one object. In Swift it might look like the below:

For much more on these, you should read the excellent FewButRipe blog for more. Note that you don’t get negation or subtraction — for that you need a Ring.

Make me one with everything 🍕

In the interests of being as general as possible, I defined my Matrix as a generic type (“structure” or “struct”) in Swift over a non-specific type T . Swift then allows the generic type to be extended for specific protocols — so I could say that as long as T is a Semiring, you would get for free the ability to add and ‘dot product’ matrices made of type T. And if T is a Ring, you’d also get the ability to subtract matrices too.

Machine Learning, Learning 🦉

Once I had the above, I could translate the original code into a Swift version. Thanks to Swift’s ability to define custom operators, I could have code that looked like this:

And this worked very nicely on matrices weight, input and bias, and looked great.

But.

It was very slow. The machine learning example from Neilsen’s blog creates a matrix of 784 rows x 30 columns : large for school mathematics but not very large in comparison to real use-cases. And each of the 30 runs of the machine learning network could take 150 seconds or more — totaling well over an hour of runtime. This made learning about the machine learning algorithms (the original point of the exercise) rather frustrating!

How could we do better?

BLAS-ing off with the Accelerate library 🛩

The Basic Linear Algebra Subprograms (BLAS) specification is a very common framework for high-speed computation of various matrix-related calculations. There are variants for many languages, and although there’s not a Swift-native version, as Swift can call out directly to C functions, we can use the BLAS functions that Apple provide as part of its Accelerate library.

There are also specializations, for example to do ( m₁ ● m₂ ) + m₃ directly as one operation — very handy for the machine learning requirement.

These run quickly — my tests show that the BLAS ( m₁ ● m₂ ) + m₃ calculation is about 7 times faster than the “naïve” version (2015 MacBook). They make use of the graphics processing unit (GPU) to run in a highly parallel fashion. Certainly ideal for my use case. The problem is that they don’t look great:

So how could we maintain the clarity of the original ( m₁ ● m₂ ) + m₃ call, but use BLAS ? The challenge here is that although it’s quite straightforward to tell Swift to interpret (m₁ ● m₂) or (m₃ + m₄) on their own as matrix calculations, here there are more complicated combinations of operators: to get the full benefits of cblas_dgemm, we need to be able to give it two matrices to multiply and a matrix to add.

Swift as a Foreign Language 🌎

Swift has other tricks up its sleeve, namely algebraic “enumerated” types, sophisticated pattern matching and type inference. We can put these together into what is known as an embedded domain-specific language (eDSL) and use that to represent our matrix calculations.

All that sounds rather complex, so let’s break it down.

An algebraic type (called an enumerated type with associated values in Swift) is a way we can build our own mini-language (a domain-specific language embedded in our code) by making our own user-defined type. For this example, we create a type called MatrixCalc to represent matrix calculations.

A matrix calculation can be made up of:

A matrix itself
The ‘transpose’ of a matrix (i.e., a matrix where all rows become columns and vice versa)
Two matrices added together
Two matrices “dot product” multiplied together
(etc)

Note also that we can define this recursively, and this gives a hint as to how we will represent more complicated matrix calculations using MatrixCalc. So we can say that a MatrixCalc is:

Again, just a matrix itself
The transpose of a MatrixCalc
The result of adding two MatrixCalcs together
The result of “dot product” multiplication of two MatrixCalcs.
(etc)

We can translate this directly into Swift code:

This means that ( m₁ ● m₂ ) + m₃ can be represented in our DSL as:

.add( .dot( .m(m₁), .m(m₂)), .m(m₃))

But to be clear: we have not changed our machine learning code at all — this is our matrix library’s representation only. By defining new custom operators for ● and + which only work on Matrix<Double>, the type inference in the Swift compiler is smart enough to recognize that it should turn ( m₁ ● m₂ ) + m₃ into .add( .dot( .m(m₁), .m(m₂)), .m(m₃)) and not try and actually do the calculation.

That’s important — we need the compiler to “wait” before it does the calculation, so that we can make sure we have all of the right pieces available and so we can call the most efficient BLAS function.

5, 4, 3, 2, 1… BLAS<T>-off 🚀

You may have appreciated however that it’s all very well creating a DSL like .add( .dot( .m(m₁), .m(m₂)), .m(m₃)), but the machine-learning code expects to get an actual matrix result. There are a few ways to make this happen, but because I wanted it to be quite explicit that the code is using the BLAS library, I decided to require a small change to the machine-learning code to make it obvious that I’m using some high-speed algorithms.

The only addition needed to the code is to add .fast() after the calculation, to tell the compiler to “stop waiting” and actually do the sums. Like so:

let machineLearningAnswer = ( ( m₁ ● m₂ ) + m₃ ).fast()

Now I see the pattern

I make this work using Swift’s pattern matching capability. This allows us to ‘take apart’ values of an enumerated (“algebraic”) type by matching parts of them against a list of possibilities.

So for example,

will match the DSL .m(m₁) and Swift will unpack m₁ into the matrix variable.

Or this one, which unpacks a .dot DSL into two matrices a and b to be dot-product multiplied:

And in order to handle the ( ( m₁ ● m₂ ) + m₃ ) case, we can match like this:

Armed with this, we can write the fast() function which uses pattern matching to find these special cases and call the BLAS functions. Like this :

Using pattern-matching to call a “blas” function (which internally calls cblas_dgemm)

The full code for the matrix library is on github (as a work-in-progress)

Fast Enough… for now 🏃‍♀️

The ultimate result of this (plus a few other similar tweaks for fast() based on looking at the CPU profiler) was that each run of the machine learning algorithm now only takes around 15 seconds, which is a 10x improvement in performance. The whole training algorithm therefore runs in under 7 minutes, which is certainly fast enough for what I need for my current analysis.

But not only does the resulting code perform well, by using Swift’s custom operators and extensible generic types I was able to strip away the boilerplate from the machine-learning code and end up with something that reflects the underlying algorithms much more clearly.

I was impressed at Swift’s ability to allow this type of software development — to build a matrix library that satisfies use cases from the super-generic to the super-quick, without sacrificing clarity.

— September 2017

Next article: Superpowered Sequences