A basic example of deep learning using a Fully Connected Neural Network (also known as a…

4 min read4 hours ago

A basic example of deep learning using a Fully Connected Neural Network (also known as a Multi-Layer Perceptron or MLP) for classification using the **MNIST dataset**, which contains 28x28 grayscale images of handwritten digits (0–9).

### Overview of the Code:
1. Data Preparation: We will load and preprocess the MNIST dataset.
2. Model Building: We will create a simple neural network using Keras and TensorFlow.
3. Training: We will train the model on the dataset.
4. Evaluation: After training, we will evaluate the model’s accuracy.
5. Prediction: We will test the model by predicting some example images.

### Code for Deep Learning with Explanation

# Step 1: Import necessary libraries
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Flatten, Input
from tensorflow.keras.datasets import mnist
from tensorflow.keras.utils import to_categorical
import matplotlib.pyplot as plt

# Step 2: Load and preprocess the MNIST dataset
# MNIST dataset contains 60,000 training images and 10,000 test images of handwritten digits.
(X_train, y_train), (X_test, y_test) = mnist.load_data()

# Normalize the pixel values to the range [0, 1]
X_train = X_train.astype('float32') / 255.0
X_test = X_test.astype('float32') / 255.0

# Convert labels to one-hot encoding
y_train = to_categorical(y_train, 10)  # 10 classes (0-9)
y_test = to_categorical(y_test, 10)

# Step 3: Define the model architecture
model = Sequential([
    Input(shape=(28, 28)),    # Input layer with shape matching the input data (28x28 grayscale images)
    Flatten(),                # Flatten the 2D image data into 1D
    Dense(128, activation='relu'),  # Fully connected hidden layer with 128 neurons
    Dense(64, activation='relu'),   # Another hidden layer with 64 neurons
    Dense(10, activation='softmax') # Output layer with 10 neurons (one for each class) and softmax activation
])

# Step 4: Compile the model
model.compile(optimizer='adam',                # Optimizer
              loss='categorical_crossentropy', # Loss function for multi-class classification
              metrics=['accuracy'])            # Evaluation metric

# Step 5: Train the model
history = model.fit(X_train, y_train, epochs=10, batch_size=32, validation_split=0.2, verbose=2)

# Step 6: Evaluate the model
loss, accuracy = model.evaluate(X_test, y_test, verbose=2)
print(f"Test accuracy: {accuracy:.2f}")

# Step 7: Make predictions
predictions = model.predict(X_test)
predicted_classes = tf.argmax(predictions, axis=1)
true_classes = tf.argmax(y_test, axis=1)

# Display the first few test images along with their predicted labels
plt.figure(figsize=(10, 5))
for i in range(10):
    plt.subplot(2, 5, i + 1)
    plt.imshow(X_test[i], cmap='gray')
    plt.title(f"Pred: {predicted_classes[i]}, True: {true_classes[i]}")
    plt.axis('off')
plt.show()

### Code Explanation:

1. Step 1: Importing Libraries
— TensorFlow and Keras: TensorFlow is the deep learning framework, and Keras is its high-level API. We will use `Sequential`, a linear stack of layers.
— Matplotlib: Used to visualize the predictions.

2. Step 2: Data Preparation
— We load the **MNIST** dataset using `mnist.load_data()` which provides 60,000 training images and 10,000 test images.
— Normalization: We normalize the images to scale the pixel values to the range `[0, 1]` for better performance.
— One-Hot Encoding: The labels (digits from 0–9) are converted into one-hot encoded vectors using `to_categorical()`.

3. Step 3: Model Building
— We define a **Sequential** model that consists of layers stacked sequentially.
— Flatten Layer: Converts the 28x28 image matrices into 1D vectors of size 784, which can be fed into the dense layers.
— Dense Layers: The first dense layer has 128 neurons with ReLU (Rectified Linear Unit) activation, which helps the network learn complex patterns. Another dense layer with 64 neurons follows. Finally, the output layer has 10 neurons (one for each class), and it uses the **softmax** activation function to output probabilities for each class.

4. Step 4: Model Compilation
— Adam Optimizer: This is an advanced optimization algorithm that adapts the learning rate during training for better performance.
— Categorical Cross-Entropy Loss: Used for multi-class classification problems. It calculates the loss between the predicted probability distribution and the true one-hot encoded label.
— Accuracy Metric: We track accuracy during training and evaluation.

5. Step 5: Model Training
— The model is trained for 10 epochs, meaning it will iterate 10 times over the entire training dataset. A validation split of 20% is used to monitor performance on unseen data during training.

6. Step 6: Model Evaluation
— After training, we evaluate the model on the test data using the `evaluate()` method. This returns the test loss and accuracy.

7. Step 7: Predictions
— We make predictions using the test dataset. The model outputs probabilities, and we use `argmax()` to determine the predicted class (the one with the highest probability).

8. Step 8: Visualization
— We visualize the first five test images and show the true label and predicted label. Matplotlib is used to plot the grayscale images.

### Key Concepts in Deep Learning:
- **Fully Connected Layers**: Layers where every neuron is connected to every neuron in the previous layer.
- **Activation Functions**: Non-linear functions applied after each layer to introduce non-linearity, which helps in learning complex patterns.
- **Backpropagation**: The process of adjusting the weights of the network based on

Written by KaNaung