Sitemap
Analytics Vidhya

Analytics Vidhya is a community of Generative AI and Data Science professionals. We are building the next-gen data science ecosystem https://www.analyticsvidhya.com

Autoencoders - Denoising Understanding!

6 min readSep 25, 2019

--

Classifying digits by training a model on MNIST dataset is really a fun thing to do with the frameworks available and putting it to production would be great.

Code: https://github.com/parmarsuraj99/Autoencoders

We know that neural networks can be seen as ‘Universal Function Estimators’ , means we can map them to their correct label. This is called Supervised learning approach.

What if we don’t have labels ? We are left with images only? What can we do with them? Now this is getting interesting. We can train a network to improve resolution of an image, De-noise them, even Generate new samples. in a way, compress them to a lower dimensions of network’s hidden layer.

Autoencoders

Deep Learning Book

“An autoencoder is a neural network that is trained to attempt to copy its input to its output.” -Deep Learning Book

It has a hidden layer h that learns a representation of input. It can be viewed as a two parts network: an Encoder part, h=f(x) and a decoder part r=g(h). The goal is to learn g(f(x))=x. But we don’t want it to simply take an input and produce exact output. It is of no use! Instead, We want it to approximate the output. That is, Learn a way to represent input in a way it can generate approximate output. that is, Learning an intermediate representation of input.

Traditionally, they have been used for the purposes of dimentionality reduction and PCA, but they are now seen as forefront of generative models by slightly changing the architecture.

a representational architecture of an autoencoder

Here we can see that we are trying to first learning to map input to a bottleneck then the bottleneck to output. This is an end to end process.

Encoder: Takes an input x (this can be image, word embeddings or voice data)and produces an output h. For instance, Think of an image of dimensions 32x32x1(HxWxC) and it is reduces to a 3x1 output. Think of this as a compression software like 7zip.

Decoder: Takes an input h(densly represented) and produces an output ~x. For example, 3x1 vector as input produces a 32x32x1 image which resembles x. It is like recovering original data from a zip file.

So, what types of data we can use? What are the Applications of this?

  1. We have seen that encoder can produce a low dimensional data, This can be used for dimensionality reduction much like Principle Component Analysis(PCA) given that data is from same domain(similar data).
  2. It approximate image, so we can map a noisy image to a de-noised image. Thus working as a de-noiser!
  3. Since the dimensions has been reduced, We can find similar images faster compared to full size images. It’s like Semantic Hashing.
  4. Autoencoders trained in an Adversarial Manner can be use for Generative purpose.
Press enter or click to view image in full size
Simple autoencoder illustration

The ideal autoencoder model balances the following:

  • Sensitive to the inputs enough to accurately build a reconstruction.
  • Insensitive enough to the inputs that the model doesn’t simply memorize or overfit the training data.

A Deep Autoencoder

We shouldn’t limit ourselves to using only one hidden layer. Here is a Deep Fully connected Network that takes flatten MNIST images and process them.

We will use MNIST dataset here. Which is black and white 28x28 images of handwritten digits as it will be easy to build and understand a simple autoencoder network.

(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train = x_train.astype('float32')/255.0
x_test = x_test.astype('float32')/255.0
x_train = x_train.reshape(len(x_train), (x_train.shape[1]*x_train.shape[2]))
x_test = x_test.reshape(len(x_test), (x_test.shape[1]*x_test.shape[2]))

Loading and processing images by flattening and normalizing for Deep Network.

encoding_dim = 32input_img = Input(shape = (784, ))
encoded = Dense(128, activation='relu')(input_img)
encoded = Dense(encoding_dim, activation='relu')(encoded)
decoded = Dense(128, activation='relu')(encoded)
decoded = Dense(784, activation='sigmoid')(decoded)
autoencoder = Model(input_img, decoded)encoder = Model(input_img, encoded)
encoded_input = Input(shape=(encoding_dim, ))
decode_layer1 = autoencoder.layers[-2]
decode_layer2 = autoencoder.layers[-1]
decoder = Model(encoded_input, decode_layer2(decode_layer1(encoded_input)))
hist = autoencoder.fit(x_train, x_train, epochs=10, validation_data=(x_test, x_test))

A simple Fully connected network which takes 784 flattened pixels and flows them through 128 neurons then 32 neurons. This 32 neuron activation is out latent space representation. which is then sequentially decoded by 128 and 784 neurons. Then we reshape it again to 28x28 images.

Here We have tried to force the encoder to compress the image in such a way that decoder can reconstruct it from this encoded representation(latent space).

Model Summary

Although the loss is decreasing, One thing to note here is that the reconstruction is lossy.

Visualization of Features and reconstruction
Visualization. Left: Original image, Middle: features visualized, Right: Reconstructed

Better results by using Convolutional Neural Nets

CNNs have proven to be very good with image data. So instead of flattening images, We can directly feed images to CNN encoders and decoders. We can expect to see better results by using them.

Press enter or click to view image in full size
latent space autoencoders

The idea here is that ConvNets learn better by learning a better way to represent input into a latent space by learning more features using filters.

def CNN_AE():
input_img = Input(shape=(img_width, img_height, 1))

# Encoding network
x = Conv2D(16, (3, 3), activation='relu', padding='same', strides=2)(input_img)
x = Conv2D(32, (3, 3), activation='relu', padding='same', strides=2)(x)
encoded = Conv2D(32, (2, 2), activation='relu', padding="same", strides=2)(x)
# Decoding network
x = Conv2D(32, (2, 2), activation='relu', padding="same")(encoded)
x = UpSampling2D((2, 2))(x)
x = Conv2D(32, (3, 3), activation='relu', padding='same')(x)
x = UpSampling2D((2, 2))(x)
x = Conv2D(16, (3, 3), activation='relu')(x)
x = UpSampling2D((2, 2))(x)
decoded = Conv2D(1, (3, 3), activation='sigmoid', padding='same')(x)

encoder = Model(input_img, encoded)

return Model(input_img, decoded), encoder

The full code is in github repo. You can open it in colab in the browser.

Press enter or click to view image in full size
Results of CNN AE with latent space representations

We can clearly see that the reconstruction results are better compared to previous deep AE. And we can see the latent space representations which is a compressed representation of the input.

Every image passes from same encoder function and is compressed. Which leads to smaller size image. We can perform clustering with less compute on it.

A fun application — image denoising

Since AE can learn to repreent images into a latent space and reconstruct from it. It can also learn to remove noise from images. For example, It we train it to map noisy images to clear images, It may learn to ignore noise and reconstruct.

noise_factor = 0.5
x_train_noisy = x_train + noise_factor * np.random.normal(loc=0.0, scale=1.0, size=x_train.shape)
x_test_noisy = x_test + noise_factor * np.random.normal(loc=0.0, scale=1.0, size=x_test.shape)
x_train_noisy = np.clip(x_train_noisy, 0., 1.)
x_test_noisy = np.clip(x_test_noisy, 0., 1.)
cnn_hist = model_cnn.fit(x_train_noisy, x_train, validation_data=(x_test_noisy, x_test))
Press enter or click to view image in full size
Noisy image reconstruction

In the notebook, I have transferred the learning from previous example. That is, using same model so it can learn faster. we can observer the change in representations in above plots. This is just one application of Autoencoders.

It seems to work pretty well. If you scale this process to a bigger ConvNet, you can start building document denoising or audio denoising models.

Sequence to Sequence

If you inputs are sequences, rather than vectors or 2D images, then you may want to use as encoder and decoder a type of model that can capture temporal structure, such as a LSTM. To build a LSTM-based autoencoder, first use a LSTM encoder to turn your input sequences into a single vector that contains information about the entire sequence, then repeat this vector n times (where n is the number of timesteps in the output sequence), and run a LSTM decoder to turn this constant sequence into the target sequence.

Variational AutoEncoders

Instead of letting Neural net decide how to represent in latent space, We put constraints on that. Means, limiting latent space representations so we can reconstruct using very few parameters. We will explore that later.

References

https://www.deeplearningbook.org/contents/autoencoders.html

https://github.com/parmarsuraj99/Autoencoders

--

--

Analytics Vidhya
Analytics Vidhya

Published in Analytics Vidhya

Analytics Vidhya is a community of Generative AI and Data Science professionals. We are building the next-gen data science ecosystem https://www.analyticsvidhya.com

No responses yet