Demystifying Neural Networks: Anomaly Detection with AutoEncoder

5 min readJan 29, 2024

This article is part of the series Demystifying Neural Networks.

Introduction

Anomaly detection is a crucial task in various industries, from fraud detection in finance to fault detection in manufacturing. With the advancement of artificial intelligence, AutoEncoder Neural Networks have emerged as a powerful tool for this purpose. This blog post aims to demystify the concept of AutoEncoders and illustrate their application in anomaly detection, specifically using a Keras example with the MNIST dataset.

What is an AutoEncoder?

An AutoEncoder is a type of neural network used for unsupervised learning. Its primary function is to learn a compressed representation of input data. An AutoEncoder consists of two main parts: the encoder and the decoder.

Encoder: This part of the network compresses the input into a latent-space representation. It encodes the input data as an encoded (compressed) representation in a reduced dimension.
Decoder: The decoder part aims to reconstruct the input data from the encoded representation. It tries to generate an output that is as close as possible to the original input.

The key idea is that AutoEncoders are trained to minimize reconstruction errors, which makes them efficient in learning the distribution of the input data.

AutoEncoders for Anomaly Detection

In the context of anomaly detection, AutoEncoders are particularly useful. They are trained on normal data to learn the representation of the normal state. During inference, if an input significantly deviates from this learned representation, the AutoEncoder will likely reconstruct it poorly. This poor reconstruction is a signal of an anomaly.

How It Works

Training: The AutoEncoder is trained exclusively on normal data. The training process involves adjusting the weights to minimize the reconstruction error.
Inference: During inference, we feed new data to the Autoencoder. If the data is normal, the AutoEncoder will successfully reconstruct it with minimal error. However, if the data is anomalous, the reconstruction error will be significantly higher.
Thresholding: We set a threshold for the reconstruction error. If the error surpasses this threshold, the data point is flagged as an anomaly.

Example — Detecting Anomaly with the MNIST dataset

Here is an example of anomaly detection using the MNIST dataset. In this example, we’ll consider all 10 digits (0–9) as normal data. For anomalous data, we’ll create a synthetic image that doesn’t resemble a digit.

Here’s how you can do it:

Load and Preprocess the MNIST Data: We use all the digits as normal data.
Create Anomalous Data: Generate a synthetic image that doesn’t look like a digit. This could be a random noise image or an image with arbitrary shapes.
Build and Train an AutoEncoder: The AutoEncoder will be trained on the normal MNIST data.
Evaluate the Model: We’ll calculate the reconstruction loss for both the normal MNIST data and the synthetic anomalous image. The expectation is that the reconstruction error will be significantly higher for the anomalous image.

The code is available in this colab notebook.

import numpy as np
import matplotlib.pyplot as plt
import keras
from sklearn.model_selection import train_test_split

# Load the MNIST dataset
(x_train, _), (x_test, _) = keras.datasets.mnist.load_data()

# Normalize and reshape the data
x_train = x_train.astype('float32') / 255.
x_test = x_test.astype('float32') / 255.
x_train = x_train.reshape((-1, 28 * 28))
x_test = x_test.reshape((-1, 28 * 28))

# Create a synthetic anomalous image
anomalous_image = np.random.rand(28 * 28)

# Build the AutoEncoder model
model = keras.Sequential([
    # Encoder: Reduce dimensionality, learn the most important features
    keras.layers.Dense(128, activation='relu', input_shape=(x_train.shape[1],)), # Reducing dimension to 128
    keras.layers.Dense(64, activation='relu'), # Further reducing dimension to 64
    keras.layers.Dense(32, activation='relu'), # Further reducing to the most compact form (bottleneck layer)

    # Decoder: Reconstruct the image from the reduced representation
    keras.layers.Dense(64, activation='relu'), # Start expanding dimension
    keras.layers.Dense(128, activation='relu'), # Continue expanding dimension
    keras.layers.Dense(x_train.shape[1], activation='sigmoid') # Restore to original image size
])

model.compile(optimizer='adam', loss='mse')

# Train the model
history = model.fit(x_train, x_train, epochs=20, batch_size=256, validation_data=(x_test, x_test))

# Function to calculate reconstruction loss
def calculate_reconstruction_loss(data, model):
    reconstructions = model.predict(data)
    reconstruction_errors = np.mean(np.abs(data - reconstructions), axis=1)
    return reconstruction_errors

# Evaluate the model
reconstruction_loss_normal = calculate_reconstruction_loss(x_test, model)
reconstruction_loss_anomalous = calculate_reconstruction_loss(np.array([anomalous_image]), model)

# Print average reconstruction loss
print(f"Average Reconstruction Loss for Normal Data: {np.mean(reconstruction_loss_normal)}")
print(f"Reconstruction Loss for Anomalous Data: {reconstruction_loss_anomalous[0]}")

# Visualization of reconstruction error distribution
plt.figure(figsize=(6, 4))
plt.hist(reconstruction_loss_normal, bins=50, alpha=0.6, color='g', label='Normal')
plt.axvline(x=reconstruction_loss_anomalous[0], color='r', linestyle='dashed', linewidth=2, label='Anomalous')
plt.title('Reconstruction Error Distribution')
plt.xlabel('Reconstruction Error')
plt.ylabel('Frequency')
plt.legend()
plt.show()

Output:

Average Reconstruction Loss for Normal Data: 0.034472864121198654
Reconstruction Loss for Anomalous Data: 0.4479920103051486

Why use Multiple Layers in the Encoder and Decoder?

Using multiple layers in both the encoder and decoder parts of an AutoEncoder, rather than a single layer for each, offers several advantages, especially in terms of the network’s ability to learn complex data representations:

1. Increased Complexity and Non-linearity: Multiple layers allow the network to learn more complex and non-linear representations of the data. Each layer can capture different levels of abstraction. A single layer, especially when dealing with high-dimensional data like images, might be too simplistic to capture the underlying structure effectively.

2. Hierarchical Feature Learning: In deep networks, lower layers often learn to recognize simple patterns, like edges in images, while deeper layers combine these simple patterns to recognize more complex features. This hierarchical learning process is more powerful and can lead to better performance in tasks like reconstruction, which is central to AutoEncoders.

3. Dimensionality Reduction: A gradual reduction of dimensionality (as opposed to a single drastic reduction) can help in preserving important information through the layers. In a single-layer scenario, reducing dimensions too aggressively in one step might result in significant loss of information, making it difficult for the decoder to reconstruct the data accurately.

4. Robust Feature Extraction: Multiple layers can help in extracting more robust and discriminative features from the data. This is particularly important in anomaly detection, as the model needs to learn features that are representative of normal data so that anomalies, which do not conform to these learned features, can be detected through higher reconstruction errors.

5. Flexibility and Fine-tuning: A multi-layered architecture offers more flexibility in designing the network. It allows for fine-tuning the capacity of the model by adjusting the number of layers and the number of neurons in each layer. This way, the model can be better adapted to the specificities of the dataset it is trained on.

In summary, multiple layers in an AutoEncoder enhance its ability to learn more sophisticated, hierarchical representations of data, leading to better performance in tasks such as anomaly detection. However, it’s also important to balance the model’s complexity with the available data and computational resources, as overly complex models can lead to overfitting, especially with limited data.

Conclusion

AutoEncoders, with their ability to learn data representations and reconstruct inputs, are exceptionally suited for anomaly detection tasks. By training on normal data and using reconstruction error as a signal, Autoencoders can effectively identify anomalies in new data. The Keras example with MNIST data illustrates how this concept can be implemented in a practical scenario. As with any machine learning model, the effectiveness of an Autoencoder in anomaly detection depends on factors like the quality of the data, the architecture of the model, and the specific nature of the anomalies being detected.