Reconstruction Of Images Using RBM

Introduction

Restricted Boltzmann Machine (RBM) is a two-layered neural network the first layer is referred to as a visible layer and the second layer is referred to as a hidden layer. RBM is also known as shallow neural networks because it has only two layers deep. RBM was invented by Paul Smolensky in 1986 with name Harmonium and later by Geoffrey Hinton who in 2006 proposed Contrastive Divergence (CD) as a method to train them.

RBM can be used for dimensionality reduction, feature extraction, and collaborative filtering.

How it works

Let’s say that we provide an image as input to an RBM. The pixels are processed by the input layer, which is also known as the visible layer.

RBMs learn patterns and extract important features in data by reconstructing the input. So, the learning process consists of several forward and backward passes, where the RBM tries to reconstruct the input data.

The weights of the neural net are adjusted in such a way that the RBM can find the relationships among input features and then determines which features are relevant.

After training is complete, the net is able to reconstruct the input based on what it learned. Here the reconstructed image is only a representation of what happens.

Understanding involves steps

Three steps are repeated over and over through the training process

  • Forward Pass
  • Backward Pass
  • Compare: At the visible layer, the reconstruction is compared against the original input to determine the quality of the result.

Forward Pass: The information at visible units (V) is passed via weights (W) and biases (c) to the hidden units (h0). The hidden unit may fire or not depending on the stochastic probability (σ is stochastic probability):

Backward Pass: The hidden unit representation (h0) is then passed back to the visible units through the same weights W, but different bias b, where they reconstruct the input. Again, the input is sampled:

These two passes are repeated for k-steps or till the convergence is reached. According to researchers, k=1 has been shown to work surprisingly well, so we will keep k = 1.

The joint configuration of the visible vector V and the hidden vector has energy. The energy function E(v,h) of an RBM is defined as:

where W represents the weights connecting hidden and visible units and b, c are the offsets of the visible and hidden layers respectively.

Also associated with each visible vector V is free energy, the energy that a single configuration would need to have in order to have the same probability as all of the configurations that contain V:

Using the Contrastive Divergence objective function, that is,

the change in weights is given by:

Here, η is the learning rate. Similar expressions exist for the biases b and c.

Math lover can find a good tutorial here.

Accompanied jupyter notebook for this post can be found here.

Conclusion

Due to its ability to reconstruct images, RBM can be used to generate more data from the existing data. RBM is also used for Collaborative Filtering, Feature Learning, Regression, Classification and Topic Modeling. It can be trained in either Supervised or Unsupervised ways, depending on the task.

I hope this article helped you to get the basic understanding Of how Restricted Boltzmann Machine (RBM) work as Images Reconstruction, Dimensionality Reduction And Feature Extraction.