Neural Network Optimization (Part 1) — Differential Evolution Algorithm

Explained and Implemented in Python

6 min readApr 8, 2024

Introduction

Optimization stands as the cornerstone of advancements in neural networks, often being the difference between a model that performs adequately and one that excels. In this article, we delve into the practical application of the Differential Evolution (DE) algorithm — a member of the evolutionary algorithm family — for optimizing neural networks.

DE is known for its simplicity and efficiency in handling non-linear, non-differentiable, and multi-modal optimization problems.

Background

The Breast Cancer Wisconsin (Diagnostic) dataset, our experimental playground, presents a binary classification problem distinguishing between benign and malignant tumors. Typically, a neural network would be trained using standard backpropagation techniques, with its architecture predetermined by the practitioner’s experience or experimental tuning. Here, we challenge this norm by introducing the DE algorithm to optimize the architecture dynamically.

Dataset Manipulation and Preliminary Training

We began by encoding our target variables and intentionally complicating our dataset to mimic real-world imperfections. Following the division into training and test subsets, we introduced noise and label flipping to broaden our model’s exposure to data irregularities.

Subsequently, a series of neural networks were trained. Metrics were meticulously recorded, and confusion matrices generated to visualize performance.

Confusion Matrices: Best Neural Network Model

Understanding Differential Evolution (DE)

At its core, DE evolves a population of candidate solutions (in our case: solutions of the Neural Network) towards an optimum by iteratively applying mutation, crossover, and selection operations.

Mutation

In DE, mutation serves as a search mechanism. For each member of the population, a new vector (mutant) is created by adding the weighted difference between two population vectors to a third vector. This process injects diversity, allowing the algorithm to explore new regions in the solution space.

Below equations define different strategies for generating new candidate solutions in the Differential Evolution algorithm.

Crossover

Crossover, or recombination, is the mechanism by which the mutant vector exchanges information with the target vector (current member of the population). This step increases the genetic diversity of the subsequent generation.

Selection

The selection process in DE is survival of the fittest. Each mutant vector competes with the target vector, and the one with better fitness (lower cost or higher accuracy in our case) is selected for the next generation. This step ensures that the population gradually moves towards an optimal solution.

Implementing Differential Evolution

DE took center stage as we sought the optimal number of neurons in our hidden layer. Our fitness function gauged the accuracy of each variant, iterating over 500 generations.

This process wasn’t merely a search — it was an evolutionary journey leading to a remarkable fitness score of 0.94, signifying the accuracy of our optimized neural network.

Fitness function:
This function evaluates the fitness of a neural network by training it with a specified number of neurons in the hidden layer, then returning the model’s accuracy on a test set. It’s used within a differential evolution algorithm to find the optimal number of neurons for maximizing accuracy.

def fitness_function(num_neurons):
    tf.keras.backend.clear_session()  # Clear any leftover model state
    num_neurons = int(num_neurons[0])  # Ensure num_neurons is an integer

    # Verify dataset availability and shapes
    assert X_train_imbalanced.ndim == 2, "Check X_train_imbalanced shape"
    assert y_train_imbalanced.ndim == 1, "Check y_train_imbalanced shape"
    assert X_test_small.ndim == 2, "Check X_test_small shape"
    assert y_test_small.ndim == 1, "Check y_test_small shape"

    model = tf.keras.Sequential([
        tf.keras.layers.InputLayer(input_shape=(input_dim,)),
        tf.keras.layers.Dense(num_neurons, activation='sigmoid'),
        tf.keras.layers.Dense(1, activation='sigmoid')
    ])
    model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
    
    # Proceed with model fitting and evaluation
    model.fit(X_train_imbalanced, y_train_imbalanced, epochs=10, batch_size=32, verbose=0)
    _, accuracy = model.evaluate(X_test_small, y_test_small, verbose=0)
    
    return accuracy

Differential Evolution Function:
This function uses Differential Evolution to optimize the neural network’s hidden layer size for maximum accuracy, iterating through mutations and selections to find the best neuron count within given bounds.

def differential_evolution(fitness_func, bounds, max_gen, pop_size, F=0.6, CR=0.7):
    # Initialize population
    pop = np.random.rand(pop_size, len(bounds))
    pop_denorm = np.array([bounds[:, 0] + individual * (bounds[:, 1] - bounds[:, 0]) for individual in pop])
    fitness = np.asarray([fitness_func(ind) for ind in pop_denorm])
    best_idx = np.argmax(fitness)
    best = pop_denorm[best_idx]
    best_fitness_history = [fitness[best_idx]]
    
    print(f"Initial population fitness: {fitness}")
    print(f"Initial best index: {best_idx}, Fitness: {fitness[best_idx]}")
    
    for gen in range(max_gen):
        print(f"\n--- Generation {gen+1}/{max_gen} ---")
        for i in range(pop_size):
            idxs = [idx for idx in range(pop_size) if idx != i]
            a, b, c = pop[np.random.choice(idxs, 3, replace=False)]
            mutant = np.clip(a + F * (b - c), 0, 1)
            cross_points = np.random.rand(len(bounds)) < CR
            if not np.any(cross_points):
                cross_points[np.random.randint(0, len(bounds))] = True
                
            trial = np.where(cross_points, mutant, pop[i])
            trial_denorm = bounds[:, 0] + trial * (bounds[:, 1] - bounds[:, 0])
            f = fitness_func(trial_denorm)
            
            if f > fitness[i]:
                fitness[i] = f
                pop[i] = trial
                print(f"Individual {i}: Fitness improved to {fitness[i]}")
                
                if f > fitness[best_idx]:
                    best_idx = i
                    best = trial_denorm
                    print(f"New best found: Index {best_idx}, Fitness: {fitness[best_idx]}")
                    best_fitness_history.append(fitness[best_idx])
    
    print(f"\nBest solution found: Index {best_idx}, Fitness: {fitness[best_idx]}")
# Define bounds for the number of neurons
bounds = np.array([[10, 100]])    # Bounds for number of neurons in the hidden layer

# Run DE
best_hyperparams = differential_evolution(fitness_function, bounds, max_gen=500, pop_size=10)
print(f"Best Number of Neurons: {int(best_hyperparams[0])}")

Make sure to play with the following parameters according to the problem:

Population Size: A larger population ensures diversity but costs more computationally. A smaller one converges quickly but risks local optima. Typically, population sizes range from 50 to 100, often set at 5 to 10 times the number of parameters.
Number of Generations: Varies with problem complexity, often between 100 to 1000, with simpler problems needing a few hundred and complex ones requiring several thousand.
Mutation Factor (F): Controls the amplification of the differential variation between individuals. Common range for F is [0.5, 2].
Crossover Rate (CR): Determines the likelihood of each parameter being replaced with the mutant parameter. Typical values for CR are in the range [0.5, 1].
Bounds: Defines the lower and upper limits for each parameter to be optimized, ensuring the algorithm searches within a specified range.

DE has just a few control parameters, making it easier to use compared to other optimization algorithms.

Results

Confusion Matrix Analysis

The average performance of our initial models displayed a satisfactory balance in prediction capabilities. The best model improved on this baseline. However, the most significant enhancement was evident post-DE optimization, where the model achieved remarkable accuracy, drastically reducing misclassifications, as seen in the test set’s confusion matrix.

Fitness Improvement

The fitness values — reflective of our model’s accuracy — progressively improved across generations of DE, culminating in a model with a fitness value close to the ideal. The progression illustrates not only the potential for improvement but also the algorithm’s capacity to refine a neural network’s performance.

Conclusion

Through DE, we navigated the intricate landscape of hyperparameters to enhance our neural network. The upshot was a model not just trained, but evolved with the finesse to accurately predict the nuances of our dataset.

Stay tuned for subsequent entries in this series, where we will explore other evolutionary strategies and their prowess in neural network optimization.

Complete code and report: https://github.com/mandarangchekar/Evolutionary-Machine-Learning/tree/main/HW03

Dataset: https://archive.ics.uci.edu/ml/machine-learning-databases/breast-cancer-wisconsin/wdbc.data

Lets Connect:

GitHub