DL : Hyperparameters Tuning for Neural Network
Part 2 of Deep Learning Specialization
Published in
4 min readMay 24, 2019
1. Network Hyperparameter
Model complexity
1.1 Number of hidden layers & hidden units per layer
1.2 Activation function of hidden layers
- Sigmoid
- Tanh
- Relu
- Leaky Relu
2.2 Number of Epoch
3. Initializing Hyperparameters
3.1 Input normalization
3.3 W, b initialization
- Zero initialization
b= ok
W= bad (does not break model’s symmetry)
- Random initialization
W =ok (break model’s symmetry)
random * big constant (c>1) = bad (exploding gradient)
random * small constant (0<c<1) = good
- Xavier initialization (for Tanh activation)
- He initialization (for Relu activation)
Reference — https://www.kdnuggets.com/2018/06/deep-learning-best-practices-weight-initialization.html
4. Regularizing Hyperparameter
Reduce overfitting
4.1 L2 regularization
lambda = regularization parameter
More lambda
- W close to 0
- Less complex
Less lambda
- W far from 0
- More complex
4.2 Dropout
5. Tuning Technique
5.1 Grid search vs. Random search
Grid search
- Based on assumption that every hyperparameter is equally-improve your model.
- High computational cost (Too many possible hyperparameter sets to test and find the best one).
Random search
- Based on assumption that there are just some of hyperparameters that can significantly improve your model.
- Low computational cost (just some random hyperparameter sets to test, find significant hyperparameters, then thoroughly test that ones).