The STN is a differentiable module which can be injected in a convolutional neural network. The default choice is to place it right “after” the input layer to make it learn the best transformation matrix theta which minimizes the loss function of the main classifier (in our case, this is IDSIA).