An `einsum` use-case

Published in

The Quarks

4 min readJun 8, 2018

Recently I’ve been working with some of my students on a paper for the SciPy Conference related to our OrNet project. Part of this involved designing our own Gaussian Mixture Model (GMM), a fantastic and extraordinarily flexible method for describing the underlying distributions of your data.

Without going into too much detail — I want to focus on the concept of an Einstein Summation, or einsum , rather than the mechanics of GMMs — solving for the parameters of a GMM requires repeated applications of two discrete steps: expectation, where our current estimate of the model is evaluated against the data, and maximization, where we update the parameters of the model so it fits the data a little bit better. For lots of Gaussians that make up a standard GMM, and when dealing with high-dimensional data, this E/M repetition can get very computationally expensive: it involves a lot of vector and matrix multiplications and summations.

Enter the Einstein Summation.

It’s a super-fancy term for a cool (but admittedly tricky-to-grasp) shorthand notation for dealing with multiplications and summations of vectors, matrices, and tensors. As a Python user, I used the numpy.einsum implementation. Problem is, the documentation really doesn’t do the function justice.

In our case, while implementing the GMM, there were two particular chunks of code where our program was spending most of its time.

The first was in computing the numerator of the n-dimensional Gaussian likelihood function. This consists of a vector-matrix multiplication, followed by a matrix-vector multiplication; the part of the Gaussian where the mean is subtracted off each data point x and then multiplied against the precision matrix.

The implementation for this looked something like:

p = np.array([(x - mu).dot(inv).dot(x - mu) for x in X])

The second was in recomputing the estimate of the covariance matrix for each Gaussian component. This consists of a vector outer product for each data point x (with the mean subtracted off), followed by a summation of all those products.

The implementation looked like this:

for x, ri in zip(X - mu_next, r):
    s += (np.outer(x, x) * ri)

( ri was a normalizing constant)

All told, our entire optimization procedure for learning the GMM required about 1 second of computation for each x. That gets really expensive, really fast.

I wish I could give you a prescribed series of steps I followed that led me to einsum , but after using Python to some level for the past decade all I can really say is that I have at least a little bit of instinct when it comes to the API; a feeling that certain things, certain concepts, should already be implemented. That’s what I felt here: that a series of vector-matrix multiplications and additions should exist, somewhere, in NumPy. Multiple Google searches to that effect turned up einsum documentation examples over and over, so I took a closer look.

These two online resources (particularly the second one) I found extremely instructive and helpful:

I ended up replacing that first computation, the numerator of the Gaussian probability density function, with this:

p = np.einsum('ni,ji,ni->n', X - mu, inv, X - mu)

This was absolutely the more complicated of the two cases, and as far as I could tell there was no online reference for this specific case. But it was a basic extension of some of the examples from the provided links: given that the operands on the left and right were identical, it stood to reason their indices in einsum ‘s argument list would be, too; all I needed to do is figure out the right combination of indices for the middle operand. That took some trial-and-error, considering my first instinct was ii, given the symmetric nature of the precision matrix.

The second computation, re-estimating the covariance matrix for each Gaussian component, was more verbose but also more conceptually straightforward:

sigma_next = np.einsum("ni,nj->ij", (X - mu_next) * r[:, np.newaxis], X - mu_next) / N

The indices to this einsum are fairly standard faire for matrix multiplication; the only wrinkle was the inclusion of the r coefficients.

And the upshot, ladies and gentlemen, was this:

The first case produced a speed-up of 40x (300ms per iteration to 7ms)
The second case produced a speed-up of 110x (847ms per iteration to 7.5ms)
Our full GMM training procedure went from ~1s per x to ~0.01s per x, two full orders of magnitude faster.

So the next time you find yourself doing repeated vector-matrix multiplications and summations, have a look at einsum . Won’t save your life, but just might make it significantly easier.

An `einsum` use-case

Written by Shannon Quinn