Self-Organizing Maps with fast.ai — Step 3: Updating SOM hyperparameters with Fast.ai Callbacks

Riccardo Sayn

Published in

Kirey Group

5 min readJul 31, 2020

This is the third part of the Self-Organizing Maps with fast.ai article series.

All the code has been published in this repository and this PyPi library.

Overview

In this article, we will look at how SOM hyperparameters work in general, and then write 3 different training strategies for our model.

SOM Hyperparameters

As you might remember from the first article, the weight update formula for Self-Organizing Maps is the following:

SOM weight update formula

Both α(s) and θ(u, v, s) depend on the current epoch s, which means ther values are being updated as the training goes on.

As such, the two hyperparameters are the learning rate α and the neighbourhood function radius σ.

Also, we can notice how the weight update for an element of the codebook depends on two main components:

the distance of the codebook element from the Best Matching Unit, represented by θ(u, v, s)
the difference of the codebook element from the data point (in this sense, we are updating the weight to make it more similar to the data point)

SOM training visualization (source: https://en.wikipedia.org/wiki/File:TrainSOM.gif)

The SOM training process can be divided in two phases:

A self-organization phase, where elements of the map are sorted based on their relative distance
A fine-tuning phase, where elements are moved closer to the data points that they represent

These two phases are handled via hyperparameter scaling: during the first one, the neighbourhood function has a large radius, and the learning rate is at its max: this means we are updating large areas around the BMU, moving many elements at once. During the second phase, the radius is small (often 1) and so is the learning rate, meaning we aim to move single codebook elements closer to their respective data points.

SOM Training strategy 1: Linear Decay

A first hyperparameter update method is to linearly decrease both α and σ based on the current epoch.

With Fast.ai, we can implement such a strategy very easily:

SOM linear decay trainer

To avoid mixing up different scaling strategies, we defined a base SomTrainer class. Let’s now pass it to the Learner by adding a trainer argument:

SomLearner (now with 100% more SomTrainers)

Now we can check if the performance of our model is improving.

Loss without decay (left) vs. loss with decay (right)

We are definitely going in the right direction! Maybe we could try out different strategies to see if we can make our SOM converge faster.

SOM Training strategy 2: Two-Phase scaling

While doing some research about SOM implementations I stumbled upon SOMPY, which is based on Numpy and works on CPU. It has an interesting hyperparameter scaling strategy that I added as a callback in order to run some benchmarks.

In this section we will rewrite that same strategy, to show how easy it can be to radically change the behavior of a model.

SOMPY explicitly divides the training process in two phases (as we described above), defining specific initial-final neighborhood radii for each.

Let’s try and implement something similar:

The scaling multipliers are magic numbers, but this is just an example of how you could write different training strategies for your model.

Having completed another SomTrainer, it’s time for a new loss plot:

SOM Training strategy 3: An Experiment

Following the idea from strategy #2, I tried a three-phase SomTrainer:

Phase 1 (15%): maximum fixed radius, maximum fixed learning rate
Phase 2 (50%): linear decaying radius from max to 1, learning rate halved
Phase 3 (35%): radius fixed to 1, learning rate 1/6th of maximum value

Percentages are relative to the number of epochs used for training.

It seems to converge really well, and in my tests this strategy seems to tolerate higher learning rates on most datasets compared to the other two approaches.

This is the loss plot of a 100-epochs training on a random dataset with LR 0.6:

Experimental scaling (on a random dataset)

Comparing strategies & final thoughts

Loss plots for strategies #1, #2, #3, on a random dataset with learning rate = 0.6

The plots above are made on random data and are just there as an example, but you can clearly see the impact of the different scaling strategies.

We only worked with hyperparameters in this article, but perhaps you could define your own strategy and switch neighborhood function between phases, or even change the map size! I really enjoyed working on this part of the project since it goes along with the idea of Fast.ai being about sensible defaults and easy customization.

In the next article we will work on the UnsupervisedDataBunch that we created during step 2 (remember?) by adding various preprocessing operations to make our model’s job easier.

We will also train our SOM on actual datasets and leverage Fast.ai’s Data Block API to use different data sources with our model.