Restricted Boltzmann Machines — Part 9

Rakesh4real

Published in

Fnplus Club

4 min readJul 29, 2019

RESTRICTED BOLTZMANN MACHINES

One of simplest neural nets
It has two layers — i. Visible ii. Hidden

Note: Neurons in same layer cannot communicate with each other directly. Hence termed “Restricted”. This architecture is common in modern nets but unusual back then and didn’t exist in earlier boltzmann machines

RBMs are termed ‘Boltzmann’ because of use of distinguishing Boltzmann Distribution Function (Used as sampling function)

Trained by,

i. Forward pass
ii. Backward pass — inputs reconstructed

The above steps are performed iteratively until weights and biases reduce error significant.

RBM Backward Pass

We share weights for forward and backward pass
But use two sets of biases
i. Hidden Bias — Used in forward pass
ii. Visible Bias — Used in backward pass
Main difference is that we read outputs at lower level during backward pass; Opposed to taking them the other side as in modern nets

It is important to note the result that RBMs work well and good on MNIST dataset but wierd results are obtained when used in recommendations systems because of SPARSE DATA (very sparse in most of the cases)

Dealing with sparse data in RBMs ?

MAIN IDEA:

Use each individual user in our training data as a set of inputs into our RBM to help train it i.e. process each user as a part of batch during training.

Note:
i. We are trying to learn weights and biases to reconstruct ratings for user-movie pairs we don’t know yet
ii. Our visible nodes aren’t just sample nodes taking in a simple input. Ratings are categorical data.
So, we will treat each individual ratings as FIVE NODES
eg. Five star = [0|0|0|0|1] data structure

Train RBM → get Weigts/Biases → Reconstruct user ratings
For a new user, run it again with known ratings of the new user.
Sparse data causes huge number of problems

If we are training RBM on every possible combination of users of movies, most of the data will be missing. Because, most of the movies have not been rated at all by a specific user. We want to predict rating for every movie though so, we need to leave space for all of them.

If, N movies → (N*5) Visible nodes

for any given user most of them are undefined or empty.

Solution

Exclude any missing rating from processsing while training RBM (See partial weights in figure)

The above solution is a tricky thing to do as most of frameworks like tensorflow assume you want to process everything in parallel all the time.

They aren’t really built for sparse data
BUT THERE ARE WAYS TO TRICK IT

(Note the missing weights in figure) As we are training our RBM with given user’s known ratings, we only attempt to learn weights and biases used for the movies user actually rated.

As we iterate through training on all other users we fill in all other weights and biases as we go.

Tensorflow actually has SPARSE TENSOR which you can use. There are also other frameworks such as DSSTNE for Sparse Data.
RBMs will probably become thing of the past as with modern nets, we can treat sparse data as complete data.

How to best train an RBM with huge amounts of sparse data?

Gradient descent needs very efficient expectation function to optimize on; And for Recommendation systems, this function is called “CONTRASTIVE DIVERGENCE”.
Contrastive Divergence samples probability distribution during training using “GIBBS SAMPLER”

Solution:
We only train it on the ratings that actually exist but reuse resulting weights and biases across other users to fill in missing ratings we want to get.

Note: In the paper mentioned above, RMSE was used to measure RBM perfomance.

Restricted Boltzmann Machines — Part 9

Table of Contents :

Netflix Prize:

RESTRICTED BOLTZMANN MACHINES

RBM Backward Pass

Dealing with sparse data in RBMs ?

Solution

How to best train an RBM with huge amounts of sparse data?

NEXT >>>

Written by Rakesh4real