Self-Organizing Maps with Fast.ai — Step 1: Implementing a SOM with PyTorch

Riccardo Sayn
Kirey Group
Published in
4 min readJul 15, 2020

This is the first part of the Self-Organizing Maps with fast.ai article series.

All the code has been published in this repository and this PyPi library.

Overview

We will get started with a small overview of what Self-Organizing Maps are and why they’re useful, and then go over the steps to implement a fast and customizable SOM model with batch training and customizable distance and neighborhood functions.

What are Self-Organizing Maps?

Self-Organizing Maps are neural networks that have the unique feature of distributing their weights (or “codebook”) in a topological representation.

Mandatory Self-Organizing Map picture

If a SOM is trained on a dataset with n features, each element of its codebook will have n features as well.

Why are SOMs useful?

  • SOMs have the ability to model a topological representation of data points, which means that they can map high-dimensional relationships between elements
  • SOM weights tend to reproduce actual training data points, which means we could use a trained map’s codebook as an input for another model

By combining the two points above, we could visualize how the predictions of a model trained on SOM weights (or on the same data as the SOM) are mapped over a 2D space.

In the following image, the value of the codebook element in position [14, 14] (bottom-right corner) is the RGB representation of the green color, and is topologically close to other shades of green:

Codebook visualization of a SOM trained on RGB colors dataset

Implementing the SOM

Tensor expansion

As mentioned above, the two main features we aim to achieve are batch training and customizable distance/neighborhood functions.

We need to abstract the tensor reshape operations (caused by batching) away from those user-defined functions, to make sure that our model is easily customizable.

For this reason, we will implement a neat little function that will do the shape-expansion for us by leveraging PyTorch’s broadcasting feature:

Tensor expansion

Defining our class

The only parameter we need to create the SOM is the map size. We will use it to initialize the codebook weights (randomly, as a start).

We are also initializing a Tensor containing the 2D indices of each codebook element of the map, for later use.

SOM class with constructor and weights/indices initialization

The training algorithm

The SOM training process can be split in two main sections:

  1. Finding the codebook element that is most similar to the input data point (also known as Best Matching Unit, BMU)
  2. Updating the BMU and all of its neighboring units, bringing them “closer” to the data point.

Since PyTorch models usually have a forward() method, we will go ahead and use that for our step 1 and a backward() function for step 2.

The forward step

Finding the BMU requires two steps: calculating the batch-to-codebook distance and finding codebook element with the minimum distance for each batch item.

We choose the p-distance as our batch-to-codebook distance function. By default we use p=2, which gives us the Euclidean distance:

P-Distance

As for finding the BMUs, we will use PyTorch’s argmin:

The last line’s torch.stack call turns the argmin output indices from flat 1D to 2D.

Let’s now wire everything together into our forward() method:

The complete forward step

The backward step

The backward step is a little more complicated, and it is different from the usual neural network back-propagation. For this reason, we are not using PyTorch’s autograd feature.

Also, we are ignoring hyperparameter scaling for the time being, as we will implement it by using Fastai Callbacks in another article of this series.

The SOM weight update rule is the following:

SOM weight update formula

where

  • α(s) is the learning rate at epoch s
  • Wv(s) is the value of codebook element v at epoch s
  • D(t) is a record
  • θ(u, v, s) is the neighborhood multiplier for codebook element v and BMU u at epoch s. It depends on the sigma hyperparameter that we defined in our constructor.

It looks hard, but let’s split it into pieces:

  • α(s) can be treated as a constant value (we defined it in the constructor!)
  • (D(t) - Wv(s)) is the difference between one item in our batch and the codebook weight v
  • θ(u, v, s) is our neighborhood function. It is usually implemented as the Gaussian of the index-difference between codebook indices and BMU indices, as follows:
Your friendly neighborhood functions

Having implemented our neighborhood function, we can finally write the backward function:

Self-Organization stepThe backward step

Note that we divide by batch size to reduce the update magnitude and avoid exploding weights in the first iterations.

The training loop

Now that we completed the SOM, let’s write a simple training loop to test it:

In the next article of the series, we will refactor this function and let Fastai’s Learner class do the heavy lifting for us.

Steps

  • Step 1: Implementing a SOM with PyTorch
  • Step 2: Training the SOM Module with a Fast.ai Learner (with loss functions) (22/07/20)
  • Step 3: Updating SOM hyperparameters with Fast.ai Callbacks (29/07/20)
  • Step 4: Handling unsupervised data with Fast.ai DataBunch (05/08/20)
  • Step 5: Interpretation on a trained Self-Organizing Map (12/08/20)

--

--