Spatial Transformer Networks with Tensorflow

Kevin Nguyen 🔥
Wonks This Way
Published in
3 min readJan 29, 2017

Images are free to come in many shapes, sizes, rotation, color, and scale. This freedom is why taking photos are so intertwined in our daily lives, feel natural during a celebration, and cherished during intimate moments with family and friends. For those very reasons, understanding images from a machine’s point of view are problematic.

Mo Images, Mo Problems

Typical problems with images for image classification — Image from CS231n Karpathy (2016)

Image classification is a hard problem. Dealing with different viewpoints, scale variation, deformation, and background clutter is a hand full. Playing with different data augmentation tasks to remedy those challenges can take up most of our time, resources, and patience. A recent innovation by developed by Max Jaderberg, Karen Simonyan, Andrew Zisserman, and Koray Kavukcuoglu aims to make it easier to deal with pesky images.

Spatial Transformers

The Spatial Transformer Network [1] allows the spatial manipulation of data within the network.

Spatial Transformers (ST) explicitly allow manipulation of image variability to live within Neural Network architectures. ST can be insterted into existing components of a convolutional architecture because of its differentiable property. That opens the opportunity for the neural networks to actively transform feature maps, conditional on the feature map itself, without any extra training supervision or modification to the optimization process [1].

Plain English

ST is another lego, in a bucket of differential legos. When dealing with images, ST allows neural networks to negotiate on its own terms how much it needs to be spatially invariant to the input data. This is game changing because responsibility moves from person to machine. At the end of the day, deep learning is all about empowering machines to think for themselves.

Image source: Technology Review

Goal

ST’s has potential to replace image pre-processing tasks. Reducing the need for handcrafted features, in turn, leads to better end-to-end deep learning architectures.

Removing the training wheels and letting it ride.

This post is the beginning of a series of posts exploring the use of ST with Convolutional Neural Networks. I’ll start by working through examples in TensorFlow Models.

https://gist.github.com/kvn219/b42d382a06eff3254bf00e780e9b8e0f

Math Behind Spatial Transformers

Curious about the math behind ST? Check out Victor Campos’s breakdown on YouTube, as he does an excellent job explaining the high and low-level details of ST.

Resources

[1] Jaderberg, Max, Karen Simonyan, and Andrew Zisserman. “Spatial transformer networks.” Advances in Neural Information Processing Systems. 2015.

[2] Tensorflow Models

[3] David Dao who ported a version of ST into Tensorflow.

--

--