# Matrix math in Rust using OpenCL

I am mostly a lurker in the rust community[1], but from my lurking I think I have seen people say we still need a good matrix math library in Rust. I find myself needing a good matrix math library too. I want to break into the world of deep learning, and I want to implement my stuff in Rust, because I like Rust. I have also been wanting to play with OpenCL for a while now.

So yesterday morning I skipped my first 2 classes and began hacking on a basic matrix math library. When I say ‘hacking’, I mean the really hacky kind. I was making new files, bouncing between them, leaving unfinished lines, and changing class/variable/argument names left and right[2]. By the end of the day things were starting to settle down, though. I’ll reflect on the stages the mess went through.

I’m no mathematician, so I’m not sure how pretty or intuitive my API will be. But hopefully I can get some input on how to smooth out the edges.

## State 1: Setup

Early in the morning, I ran `cargo new --name matrix matrix-rs/`. I added rust-opencl as a dependency in `Cargo.toml`. I copied the example `demo.ocl` file over and pasted the example code into the `it_works` test case you get with a new crate. Progress! Sorta.

The first thing I did was create a `Matrix` class to work with `CLBuffer<T>` from the `opencl` crate. This soon got renamed to `ClMatrix`.

## Stage 2: N-Dimensional

I thought it’d be cool to allow N-dimensional matrices. So I started setting up the machinery to allow them. I created a `Matrix` struct to store and operate a matrix on the CPU side and a `ClMatrix` struct for the OpenCL side.

`pub struct Matrix<T> {`

dims: Vec<usize>,

dim_steps: Vec<usize>,

buffer: Vec<T>,

}

**dims**: the shape of the matrix, which can have an arbitrary number of dimensions.**dim_steps**: the ‘volume’ of 1 unit in each dimension.

buffer: the matrix contents

`dims` was a parameter to the constructors, and it is used to calculate `dim_steps`.

`let mut dim_steps = vec![0; dims.len()];`

dim_steps[dims.len()-1] = 1;

for i in 1..dims.len() {

let cur_index = dims.len()-i-1;

dim_steps[cur_index] = dims[cur_index]*dims[cur_index+1];

}

`dim_steps` could then be used to access elements:

`pub fn get<’a>(&’a self, coords: &Vec<usize>) -> &’a T {`

let index = coords.iter()

.zip(self.dim_steps.iter())

.map(|(c, s)| (*c)*(*s))

.sum();

&self.buffer[index]

}

`ClMatrix` was similar, except did not need dim_steps as it does not need to provide access to elements.

`pub struct ClMatrix<T> {`

dims: Vec<usize>,

buffer: CLBuffer<T>,

}

Creating this struct was easy. I just needed to calculate the buffer size.

`pub fn new(ctx: &Context,`

dims: Vec<usize>,

mode: ClMatrixMode) -> ClMatrix<T> {

let buf_size = dims.iter().fold(1, |a, b| a*b);

let cl_mem_mode =

match mode {

ClMatrixMode::In => { opencl::cl::CL_MEM_READ_ONLY },

ClMatrixMode::Out => { opencl::cl::CL_MEM_WRITE_ONLY },

ClMatrixMode::Mut => { opencl::cl::CL_MEM_READ_WRITE},

};

ClMatrix {

dims: dims,

buffer: ctx.ctx.create_buffer(buf_size, cl_mem_mode),

}

}

I implemented a constructor to create a `ClMatrix` from a `Matrix`, a wrapper on top of `opencl`’s `Event` that allowed me to download a `ClMatrix` into a `Matrix`, and a simple add operation.

pub struct Event(opencl::hl::Event);impl Event {

pub fn get<T: Num>(&self,

ctx: &Context,

cl_matrix: &ClMatrix<T>) -> Matrix<T> {

let vec = ctx.queue.get(&cl_matrix.buffer, &self.0);

Matrix::from_vec(cl_matrix.dims.clone(), vec)

}

}

Add `ClMatrix`’s

`pub fn add(&self,`

ctx: &Context,

other: &ClMatrix<T>,

output: &ClMatrix<T>) -> Event {

let kernel = ctx.program.create_kernel(“vector_add”);

kernel.set_arg(0, &self.buffer);

kernel.set_arg(1, &other.buffer);

kernel.set_arg(2, &output.buffer);

let event = ctx.queue.enqueue_async_kernel(&kernel,

self.buffer.len(),

None, ());

Event(event)

}

OpenCL kernel code, torn straight from the rust-opencl demo.

`__kernel void vector_add(__global const long *A,`

__global const long *B,

__global long *C) {

int i = get_global_id(0);

C[i] = A[i] + B[i];

}

I then realized that I didn’t really know how to do operations like multiplication and dot product for high dimensional matrices. Also, I only needed operations on 2D matrices for now. I looked at Theano’s API, and they only seem to support matrices up to 4D. They also have a separate type for each dimension. I decided to follow their lead. It was here I also realized that ‘matrix’ refers to 2D arrays, while ‘array’ is for higher dimensions. So I decided to just start with 2D arrays — matrices.

## Stage 3: 2-Dimensional

I removed the N-dimensional machinery and replaced `dims: Vec<usize>` with `rows: usize` and `columns: usize`. This made creating and accessing buffers much simpler.

`pub struct Matrix<T: Num> {`

rows: usize,

columns: usize,

buffer: Vec<T>,

}

And a constructor

`pub fn new(rows: usize, columns: usize, initial: T) -> Matrix<T> {`

Matrix {

rows: rows,

columns: columns,

buffer: vec![initial; rows*columns],

}

}

And accessing an element

`pub fn get(&self, row: usize, column: usize) -> &T {`

&self.buffer[row*self.columns + column]

}

Much more compact :)

## Next steps

Right now, all the library can do is add matrices with elements of type `usize`. I need to generalize it to be able to handle all of the types OpenCL can handle. This is where that `Num` trait you may have noticed above will come in. I think it’ll have a method returning the type name of the OpenCL equivalent, and I’ll use that along with a kernel naming scheme to access the proper kernel for each `ClMatrix<T: Num>` type.

After that, I’ll implement some cooler matrix operations! Like transpose, multiply, and dot. During this time I’ll also be looking for ways to improve the API. For instance, right now the user has to allocate the output `ClMatrix` before performing an operation. It’d be nice if the operations allocated the matrix themselves so the user wouldn’t have to think about the output shape for things like dot and multiply.

[1] I usually only go on IRC if I have a quick question. Maybe that’s bad.

[2] Maybe this should be ‘up and down’ since source files are arranged vertically. But that doesn’t sound as good.