Why Hyperparameters Matter!

Learn how hyperparameters shape the function space of neural networks.

Mohit Mishra
Nerd For Tech
6 min readJan 29, 2024

--

I recommend going through the previous parts of Neural Networks 101 to ensure you understand what I’m talking about and how everything works together.

This blog focuses on understanding hyperparameters, their importance, and their impact. We’ll also look at a learning rate example.

Now that everything is in order, let’s move on to our topic:

Understanding Hyperparameters

Hyperparameters are settings or configurations that affect the architecture and behavior of a neural network. Unlike parameters, which are learned from training data, hyperparameters are determined before the training process begins and remain constant throughout. Here’s an explanation of important hyperparameters commonly used in neural networks:

Number of Layers: The number of layers in a neural network refers to the total count of hidden layers between the input and output layers. Each layer contains neurons (nodes) that perform computations on the input data.

Increasing the number of layers allows the neural network to learn more complex patterns and relationships in the data. However, adding too many layers can lead to overfitting if the model becomes too complex for the given dataset.

Units Per Layer: The number of units (neurons) in each layer of the neural network determines the model’s capacity and expressiveness. To generate an output, each unit calculates the weighted sum of its inputs and applies an activation function.

Increasing the number of units per layer increases the model’s capacity to learn intricate patterns in the data. However, a larger number of units also increases the computational complexity of the model and may require more data for training to avoid overfitting.

Learning Rate: The learning rate is a hyperparameter that controls the size of the steps taken during gradient descent optimization. It determines how quickly the model parameters (weights and biases) are updated during training.

A higher learning rate allows the model to converge faster, but it may overshoot or oscillate around the optimal solution. In contrast, a lower learning rate causes slower convergence but may result in more stable and accurate solutions. Finding the right balance is critical to effective training.

Now let’s take one example to understand whatever we learned till now.

Consider training a neural network for image classification with the CIFAR-10 dataset, which includes 60,000 32x32 color images divided into ten classes. The hyperparameters for this task could be defined as follows:

  • Number of Layers: We could try out different architectures, such as a shallow network with one or two hidden layers or a deeper network with three or more.
  • Units per Layer: Depending on the architecture, we could adjust the number of units in each layer to control the model’s capacity and complexity.
  • Learning Rate: We would choose an appropriate learning rate, usually beginning with a low value (e.g., 0.001) and adjusting it based on the model’s performance during training.

Significance of Hyperparamter in NN Training

Hyperparameters are critical components of neural network training that have a significant impact on the model’s performance and behavior. Understanding their significance is critical for achieving the best results. Here’s why hyperparameters are important:

Impact on Model Performance

  • Hyperparameters determine the neural network’s architecture and behavior during training. They determine critical parameters like model capacity, learning dynamics, and convergence behavior.
  • Proper hyperparameter selection and tuning can result in improved model performance, such as increased accuracy, faster convergence, and better generalization to new data.

Sensitivity to Hyperparameter Selection

  • Neural networks are extremely sensitive to hyperparameter settings. Small hyperparameter changes can cause significant variations in model outcomes.
  • For example, changing the learning rate can have an impact on the training process’s convergence speed and stability. Choosing the right learning rate is critical for avoiding problems like slow convergence, oscillations, and divergence.

Trade-offs and Compromises

  • Hyperparameter tuning frequently entails navigating trade-offs and compromises. For example, increasing the number of layers or units per layer can improve the model’s ability to capture complex patterns while also increasing the risk of overfitting.
  • Finding the optimal balance between model complexity and generalization capability necessitates careful consideration and experimentation with hyperparameter settings.

Generalization and Robustness

  • Well-tuned hyperparameters help the model generalize well to previously unseen data. A model that performs well on training data but poorly on unseen data may be overfitting due to suboptimal hyperparameter selection.
  • Robust hyperparameter settings improve the model’s ability to handle variations in input data and environmental conditions, resulting in more consistent performance in real-world scenarios.

Model Interpretability

  • Hyperparameters can also affect the trained model’s interpretability. For example, simpler models with fewer layers or units per layer may be easier to interpret and comprehend.
  • The selection of hyperparameters can influence the model’s internal representations and decision boundaries, affecting its interpretability and explainability.

Consider training a neural network for sentiment analysis using text data. Hyperparameters like the number of layers, units per layer, and learning rate can have a significant impact on the model’s performance.

  • Increasing the number of layers or units per layer can improve the model’s ability to capture nuanced semantic features in text, but it also increases the risk of overfitting.
  • Adjusting the learning rate can affect the training process’s convergence speed and stability, as well as the model’s final accuracy and generalization ability.

Enough with the theory work. Let us now look at an example of learning rate and understand it using a mathematical equation.

To understand what I’m saying in the rest of this section, I recommend going through gradient descent at least once. I have already published a detailed blog on gradient descent that covers everything we will be discussing today.

Please click on this link and read through the Gradient Descent blog once.

Hyperparameter Impact: Learning Rate in Neural Network Training

Consider the update rule for the weights θ in a neural network trained using gradient descent:

Source: Image by the author.

Where:

  • α is the learning rate.
  • J(θ) is the cost function representing the error between predicted and actual outputs.

Gradient descent aims to minimize the cost function J(θ) by iteratively adjusting weights θ in the opposite direction to the gradient of J(θ) concerning θ.

Let’s analyze how the learning rate (α) affects the convergence of the optimization process.

Case 1: High Learning Rate α

If the rate of learning (α) is excessively high,

  • The large updates to weights (θ) cause the optimization process to take significant steps toward minimizing the cost function.
  • While this may result in faster convergence at first, it can also cause overshoots or oscillations around the minimum.
  • In extreme cases, the optimization process may diverge, resulting in an increase in the cost function rather than a decrease.

Case 2: Low Learning Rate α

If the learning rate (α) is excessively low,

  • The small updates to weights θ cause the optimization process to take tiny steps toward minimizing the cost function.
  • While this ensures stability and prevents overshooting, it also results in slow convergence because the optimization process moves very slowly.
  • In some cases, the optimization process may become stuck in local minima or saddle points, preventing the algorithm from achieving the global minimum.

Optimal Learning Rate Selection

The optimal learning rate strikes a balance between convergence speed and stability. It should be chosen so that the optimization process converges efficiently to the minimum of the cost function without overshooting or becoming stuck.

In neural network training, the learning rate (α) is a crucial hyperparameter. Proper selection and tuning have a significant impact on the convergence behavior, stability, and performance of the optimization process. By carefully adjusting the learning rate, practitioners can ensure that neural network models are trained efficiently and effectively for various machine-learning tasks.

I recommend reading this blog to gain a thorough understanding of learning parameters.
That’s all for today; see you all in the next blog. Until then, take care and love you all.

About Me

My name is Mohit Mishra, and I’m a blogger that creates intriguing content that leave readers wanting more. Anyone interested in machine learning and data science should check out my blog. My writing is designed to keep you engaged and intrigued with a regular publishing schedule of a new piece every two days. Follow along for in-depth information that will leave you wanting more!

If you liked the article, please clap and follow me since it will push me to write more and better content. I also cited my GitHub account and Portfolio at the bottom of the blog.

All images and formulas attached have been created by AlexNail and the CodeCogs site, and I do not claim ownership of them

Thank you for reading my blog post on Why Hyperparameters Matter! I hope you find it informative and helpful. If you have any questions or feedback, please feel free to leave a comment below.

I also encourage you to check out my portfolio and GitHub. You can find links to both in the description below.

I am always working on new and exciting projects, so be sure to subscribe to my blog so you don’t miss a thing!

Thanks again for reading, and I hope to see you next time!

[Portfolio Link] [Github Link]

--

--

Mohit Mishra
Nerd For Tech

My skills include Data Analysis, Data Visualization, Machine learning, and Deep Learning. I have developed a strong acumen for problem-solving, and I enjoy ML.