DL : Hyperparameters Tuning for Neural Network

Part 2 of Deep Learning Specialization

Pisit J.
Sum up As A Service
4 min readMay 24, 2019

--

1. Network Hyperparameter

Model complexity

1.1 Number of hidden layers & hidden units per layer

Complexity of Network depends on Number of hidden layers, more than Number of hidden units per layer.
More deep the network is (More layers), More Complex the network gets.

1.2 Activation function of hidden layers

  • Sigmoid
  • Tanh
  • Relu
  • Leaky Relu

2. Learning Hyperparameter

Weight & Bias update

Computational cost

Memory efficiency

2.1 Learning rate

link
ผลการค้นหารูปภาพสำหรับ learning rate
link

2.2 Number of Epoch

link

2.3 Batch size of training data

  • Stochastic = 1 training data
  • Mini-Batch
  • Batch = whole training data
ผลการค้นหารูปภาพสำหรับ mini batch stochastic gradient descent
link
ผลการค้นหารูปภาพสำหรับ mini batch stochastic gradient descent
link

2.4 (Learning-rate) Optimizer

  • Momentum
  • RMSProp
  • Adam
link
link

3. Initializing Hyperparameters

3.1 Input normalization

Input normalization >> Zero mean & Unit variance link

3.2 Batch normalization

link
z = vector of linear combination WX+b, before activation

3.3 W, b initialization

  • Zero initialization

b= ok

W= bad (does not break model’s symmetry)

  • Random initialization

W =ok (break model’s symmetry)

random * big constant (c>1) = bad (exploding gradient)

random * small constant (0<c<1) = good

  • Xavier initialization (for Tanh activation)
ผลการค้นหารูปภาพสำหรับ xavier initialization
  • He initialization (for Relu activation)
ผลการค้นหารูปภาพสำหรับ he initialization

Reference — https://www.kdnuggets.com/2018/06/deep-learning-best-practices-weight-initialization.html

4. Regularizing Hyperparameter

Reduce overfitting

4.1 L2 regularization

lambda = regularization parameter

More lambda

  • W close to 0
  • Less complex

Less lambda

  • W far from 0
  • More complex

4.2 Dropout

5. Tuning Technique

5.1 Grid search vs. Random search

Grid search

  • Based on assumption that every hyperparameter is equally-improve your model.
  • High computational cost (Too many possible hyperparameter sets to test and find the best one).

Random search

  • Based on assumption that there are just some of hyperparameters that can significantly improve your model.
  • Low computational cost (just some random hyperparameter sets to test, find significant hyperparameters, then thoroughly test that ones).

Reference

Deep Learning Specialization: Improving Deep Neural Networks: Hyperparameter tuning, Regularization and Optimization (Coursera)(Youtube)

--

--