Complete Guide to build an AutoEncoder in Pytorch and Keras
This article is continuation of my previous article which is complete guide to build CNN using pytorch and keras.
Taking input from standard datasets or custom datasets is already mentioned in complete guide to CNN using pytorch and keras. So we can start with necessary introduction to AutoEncoders and then implement one.
AutoEncoders
Auto Encoder is a neural network that learns encoding data with minimal loss of information.
There are many variants of above network. Some of them are:
Sparse AutoEncoder
This auto-encoder reduces overfitting by regularizing activation function hidden nodes.
Denoising AutoEncoder
This auto-encoder is trained by adding noise to input. This will remove noise from input at evaluation.
Variation AutoEncoder
This is kind of deep generative neural network. Major challenge with Auto Encoders is they always try to minimise reconstruction error and never bother about underlying latent representation.
A good latent representation should always be meaningful so that it can be used in generative neural networks like GAN. Meaningful refers to arrangement. Grouping data points from same class closer and data points form different class little farther.
This kind of latent representation can be achieved by changing structure of neural network as follows:
Unlike remaining auto encoders, We are generating a latent distribution with mean and standard deviation instead of single latent vector. We will then sample from latent distribution to reconstruct the input.
The two important things about variation auto encoder are:
While sampling we need to handle randomness of node using re-parametrization trick as randomness of node may stop backpropogation.
N( μ,𝛔) ≈μ+𝛔*N(0,1)
This re-parametrization trick will not change distribution. But it will adjust the parameters to allow backpropogation.
Variation Auto Encoder regularizes cost function using following equation.
Regularized Cost Function= Loss+KL(N(μ,𝛔),N(0,1))
This forces the latent distribution to follow standard normal distribution that extends its usage in deep generative models .
You can read more about VAE in this article and more about various types of auto-encoders here. We will implement VAE in this article.
Implementation
Any auto-encoder comprises of two networks encoder and decoder. As previously said, VAE also uses regularized cost function.
Encoder
Encoder takes input and returns mean and standard deviation of a latent distribution.
#Pytorchclass VAE(nn.Module):
def __init__(self, x, h1, h2, z):
super(VAE, self).__init__()
self.fc1 = nn.Linear(x, h)
self.fc2 = nn.Linear(h1, h2)
self.fc_mean = nn.Linear(h2, z)
self.fc_sd = nn.Linear(h2, z)
def encoder(self, x):
h1 = F.relu(self.fc1(x))
h2 = F.relu(self.fc2(h1))
return self.fc_mean(h2), self.fc_sd(h2) # mu, log_var#Kerasx = Input(batch_shape=(batch_size, original_dim))
h = Dense(intermediate_dim, activation='relu')(x)
z_mean = Dense(latent_dim)(h)
z_log_sigma = Dense(latent_dim)(h)
Sampling
From mean and standard deviation obtained from encoder, we will generate input to decoder by sampling. Above mentioned re-parametrization trick comes into picture here.
#Pytorchdef sampling(self, mu, log_var):
std = torch.exp(0.5*log_var)
eps = torch.randn_like(std)
return eps.mul(std).add_(mu)
#Kerasdef sampling(args):
z_mean, z_log_sigma = args
epsilon = K.random_normal(shape=(batch_size, latent_dim),
mean=0., std=epsilon_std)
return z_mean + K.exp(z_log_sigma) * epsilon
Decoder
Decoder takes output of sampling function and tries to reconstruct the original input.
#Pytorchclass VAE(nn.Module):
def __init__(self, x, h1, h2, z):
super(VAE, self).__init__()
self.fc1 = nn.Linear(x, h1)
self.fc2 = nn.Linear(h1, h2)
self.fc_mean = nn.Linear(h2, z)
self.fc_sd = nn.Linear(h2, z)
# decoder
self.fc4 = nn.Linear(z, h2)
self.fc5 = nn.Linear(h2, h1)
self.fc6 = nn.Linear(h1, x)
def decoder(self, z):
h1 = F.relu(self.fc4(z))
h2 = F.relu(self.fc5(h1))
return F.sigmoid(self.fc6(h2))#Kerasdecoder_h = Dense(intermediate_dim, activation='relu')
decoder_mean = Dense(original_dim, activation='sigmoid')
h_decoded = decoder_h(z)
x_decoded_mean = decoder_mean(h_decoded)
Loss Function
As previously mentioned, VAE uses regularized loss function,
KL divergence of distribution with mean μi and standard deviation 𝛔i with standard normal distribution ( KL(N(μi,𝜎I),N(0,1)) ) is
#Pytorchdef loss_function(reconstructed_x, x, mu, log_var):
loss = F.binary_cross_entropy(reconstructed_x, x.view(-1, 784),
reduction='sum')
regularized_term = -0.5 * torch.sum(1 + log_var - mu.pow(2) -
log_var.exp())
return loss + regularized_term#Kerasdef vae_loss(x, x_decoded_mean):
xent_loss = objectives.binary_crossentropy(x, x_decoded_mean)
kl_loss = - 0.5 * K.mean(1 + z_log_sigma - K.square(z_mean) -
K.exp(z_log_sigma), axis=-1)
return xent_loss + kl_loss
Flow of data
Data starts from encoder, sampling and then decoder .
#Pytorchdef forward(self, x):
mu, log_var = self.encoder(x.view(-1, 784))
z = self.sampling(mu, log_var)
return self.decoder(z), mu, log_var
In keras, there is no need of forward function. Data will flow in the order you modelled your network.
Compiling Network with loss function.
#Pytorchvae = VAE(x_dim=784, h_dim1= 512, h_dim2=256, z_dim=2)
latent, mu, log_var = vae(data)
loss = loss_function(latent, data, mu, log_var)
loss.backward()
optimizer.step()#Kerasvae = Model(x, x_decoded_mean)vae.compile(optimizer='rmsprop', loss=vae_loss)
Also we will pack the implementation of GAN in pytorch and keras in next article.
Thanks for reading:))