Random Model Hyperparameter Search for Convolutional Neural Networks: a PyTorch Example

Zhanwen Chen
Repro Repo
Published in
8 min readJul 24, 2018

When you Google “Random Hyperparameter Search,” you only find guides on how to randomize learning rate, momentum, dropout, weight decay, etc. What if you also want to experiment with model hyperparameters like convolutional kernel size, stride, number of kernels, and even the number of fully connected layers? With no ready answers, I solved it on my own.

Randomizing model hyperparameters makes sense when you are solving a problem distinct from generic tasks such as classifying everyday objects. For me, I’m tackling a time series regression problem, which calls for more experimentation with all hyperparameters.

To better illustrate my approach to random model hyperparameter search, I’m using the most basic CNN — LeNet. Note that although layers themselves are randomly sized, the number of layers (other than fully connected layers) and the kinds of layers are hard-coded (thus still LeNet).

The value of model hyperparameter search is to abstract away layer sizes from an architecture. For example, when we talk about LeNet-5, we no longer need to specify the number of kernels, the kernel size, the pooling stride, etc. They are arbitrary anyway and don’t let any paper tell you otherwise!

Part I. The Concept: CNN Model Hyperparameters

0a. Convolution and Pooling Sizes

Thanks to Stanford CS231n, we can express the sizes of convolution and pooling:

# Convolutional Layer Output Shape
conv_output_size = (conv_num_kernels,
(input_size - conv_kernel_size)/conv_stride + 1
)
# Pooling Layer Output Shape
pool_output_size = (conv_num_kernels,
(conv_output_size[1] - pool_kernel_size)/pool_stride + 1
)

This allows us to dynamically define our model at model instantiation:

# excerpt from lenet.py
# Model creation (omitting fully connected layers)
class LeNet(nn.Module):
def __init__(self, input_size,
output_size,
batch_norm,use_pooling,
pooling_method,
conv1_kernel_size,
conv1_num_kernels,
conv1_stride,
conv1_dropout,
pool1_kernel_size,
pool1_stride,
conv2_kernel_size,
conv2_num_kernels,
conv2_stride,
conv2_dropout,
pool2_kernel_size,
pool2_stride,
fcs_hidden_size,
fcs_num_hidden_layers,
fcs_dropout):
super(LeNet, self).__init__()self.input_size = input_size
self.hidden_size = fcs_hidden_size
self.output_size = output_size
self.batch_norm = batch_norminput_channel = 2# If not using pooling, set all pooling operations to 1 by 1.
if use_pooling == False:
warnings.warn('lenet: not using pooling')
pool1_kernel_size = 1
pool1_stride = 1
pool2_kernel_size = 1
pool2_stride = 1
# Conv1
conv1_output_size = (conv1_num_kernels, (input_size - conv1_kernel_size) / conv1_stride + 1)
# if not conv1_output_size[1].is_integer():
# raise ValueError('lenet: conv1_output_size[1] %s is not an integer.' % conv1_output_size[1])
# conv1_output_size = (conv1_num_kernels, int(conv1_output_size[1]))
self.conv1 = nn.Conv1d(input_channel, conv1_num_kernels, conv1_kernel_size, stride=conv1_stride) # NOTE: THIS IS CORRECT!!!! CONV doesn't depend on num_features!
nn.init.kaiming_normal_(self.conv1.weight.data)
self.conv1.bias.data.fill_(0)
self.conv1_drop = nn.Dropout2d(p=conv1_dropout)
if self.batch_norm == True:
self.batch_norm1 = nn.BatchNorm1d(conv1_num_kernels)
# Pool1
pool1_output_size = (conv1_num_kernels, (conv1_output_size[1] - pool1_kernel_size) / pool1_stride + 1)
self.pool1 = nn.MaxPool1d(pool1_kernel_size, stride=pool1_stride) # stride=pool1_kernel_size by default# Conv2
conv2_output_size = (conv2_num_kernels, (pool1_output_size[1] - conv2_kernel_size) / conv2_stride + 1)
self.conv2 = nn.Conv1d(conv1_num_kernels, conv2_num_kernels, conv2_kernel_size, stride=conv2_stride) # NOTE: THIS IS CORRECT!!!! CONV doesn't depend on num_features!
nn.init.kaiming_normal_(self.conv2.weight.data)
self.conv2.bias.data.fill_(0)
self.conv2_drop = nn.Dropout2d(p=conv2_dropout)
if self.batch_norm == True:
self.batch_norm2 = nn.BatchNorm1d(conv2_num_kernels)
# Pool2
pool2_output_size = (conv2_num_kernels, (conv2_output_size[1] - pool2_kernel_size) / pool2_stride + 1)
self.pool2 = nn.MaxPool1d(pool2_kernel_size, stride=pool2_stride) # stride=pool1_kernel_size by default

0b. Progressive Layer Sizing

In the above step, you might notice that pool_output_size depends on conv_output_size, and that both sizes must ensure integer divisibility in (input_size — conv_kernel_size)/conv_stride and (conv_output_size[1] — pool_kernel_size)/pool_stride + 1, respectively. If one layer has a wrong size, the later layers might not have an integer shape, which is illegal and will cause mystery error outputs from PyTorch.

Furthermore, the size of fully-connected layers depends on the shape of their previous layer. In our case it’s pool2:

# excerpt from lenet.py
# Model creation (the omitted fully-connected layers)
# FCs
fcs_input_size = pool2_output_size[0] * pool2_output_size[1]
# if not fcs_input_size.is_integer():
# raise ValueError('lenet: fcs_input_size = ' + fcs_input_size + ' is not an integer')
# fcs_input_size = int(fcs_input_size)
self.fcs = FullyConnectedNet(fcs_input_size,
output_size,
fcs_dropout,
batch_norm,
fcs_hidden_size,
fcs_num_hidden_layers)

To solve this problem of cross-layer size dependency, one naive approach is to keep randomly choosing one hyperparameter until all layers have integer sizes, for all model hyperparameters. A more efficient way is to predetermine all legal combinations of model hyperparameters and randomly choose from them, such as this:

# Part of create_model.py: Size-constrained random hyperparameter search
possible_size_combinations = []
for conv1_kernel_size in conv1_kernel_size_range:
for conv1_stride in conv1_stride_range:
# Satisfy conv1 condition
if (input_size - conv1_kernel_size) % conv1_stride != 0:
continue
conv1_output_size = (conv1_num_kernels, (input_size - conv1_kernel_size) / conv1_stride + 1)
for pool1_kernel_size in pool1_kernel_size_range:
for pool1_stride in pool1_stride_range:
if (conv1_output_size[1] - pool1_kernel_size) % pool1_stride != 0:
continue
pool1_output_size = (conv1_num_kernels, (conv1_output_size[1] - pool1_kernel_size) / pool1_stride + 1)
for conv2_kernel_size in conv2_kernel_size_range:
for conv2_stride in conv2_stride_range:
if (pool1_output_size[1] - conv2_kernel_size) % conv2_stride != 0:
continue
conv2_output_size = (conv2_num_kernels, (pool1_output_size[1] - conv2_kernel_size) / conv2_stride + 1)
for pool2_kernel_size in pool2_kernel_size_range:
for pool2_stride in pool2_stride_range:
if (conv2_output_size[1] - pool2_kernel_size) % pool2_stride != 0:
continue
pool2_output_size = (conv2_num_kernels, (conv2_output_size[1] - pool2_kernel_size) / pool2_stride + 1)
possible_size_combinations.append((conv1_kernel_size, conv1_stride, pool1_kernel_size, pool1_stride, conv2_kernel_size, conv2_stride, pool2_kernel_size, pool2_stride))if len(possible_size_combinations) == 0:
raise ValueError('create_models: no possible combination for pool1 given conv1_output_size[1] = ' + str(conv1_output_size[1]) + '; pool1_kernel_size_ranges = ' + str(pool1_kernel_size_ranges) + '; pool1_stride_ranges = ' + str(pool1_stride_ranges))
conv1_kernel_size, conv1_stride, pool1_kernel_size, pool1_stride, conv2_kernel_size, conv2_stride, pool2_kernel_size, pool2_stride = random.choice(possible_size_combinations)

This way we can be sure that we won’t get stuck in randomizing while loops due to bad hyperparameter ranges; we also keep the code in one place and relatively neat (it might not look the part, but try replacing the above with 10+ while loops). Keep reading for the full implementation details.

Part II. The Implementation: Random Model Hyperparameter Search

My overall approach consists of a creation and a training process. The reason they are separate is for scientific evaluation, reproducibility, and parallel computation at scale. First, the create_models function takes the number of models to create and a hyperparameter range file, and write them to file.

Note: Please ignore the “k” submodels — I save len(k) copies of the same model for later domain-specific evaluation.

# create_models.py
# The creation process - randomize model and training hyperparameters and write them to file.
# Example usage: python lib/create_models.py 50 hyperparam_ranges.jsonimport os
import datetime
import random
import argparse
import json
import warnings
import sys
from utils import save_model_params, ensure_dirdef choose_hyperparameters_from_file(hyperparameter_ranges_file):
with open(hyperparameter_ranges_file) as f:
ranges = json.load(f)
# Load constants.
input_size = ranges['input_size']
batch_norm = random.choice(ranges['batch_norm'])
use_pooling = random.choice(ranges['use_pooling'])
conv1_num_kernels = random.choice(list(range(*ranges['conv1_num_kernels'])))
conv1_dropout = random.uniform(*ranges['conv1_dropout'])
conv2_num_kernels = random.choice(list(range(*ranges['conv2_num_kernels'])))
conv2_dropout = random.uniform(*ranges['conv2_dropout'])
# Randomly choose model hyperparameters from ranges.
conv1_kernel_size_range = list(range(*ranges['conv1_kernel_size']))
conv1_stride_range = ranges['conv1_stride']
pool1_kernel_size_range = ranges['pool1_kernel_size']
pool1_stride_range = ranges['pool1_stride']
conv2_kernel_size_range = list(range(*ranges['conv2_kernel_size']))
conv2_stride_range = ranges['conv2_stride']
pool2_kernel_size_range = ranges['pool2_kernel_size']
pool2_stride_range = ranges['pool2_stride']
# Size-constrained random hyperparameter search
possible_size_combinations = []
for conv1_kernel_size in conv1_kernel_size_range:
for conv1_stride in conv1_stride_range:
# Satisfy conv1 condition
if (input_size - conv1_kernel_size) % conv1_stride != 0:
continue
conv1_output_size = (conv1_num_kernels, (input_size - conv1_kernel_size) / conv1_stride + 1)
for pool1_kernel_size in pool1_kernel_size_range:
for pool1_stride in pool1_stride_range:
if (conv1_output_size[1] - pool1_kernel_size) % pool1_stride != 0:
continue
pool1_output_size = (conv1_num_kernels, (conv1_output_size[1] - pool1_kernel_size) / pool1_stride + 1)
for conv2_kernel_size in conv2_kernel_size_range:
for conv2_stride in conv2_stride_range:
if (pool1_output_size[1] - conv2_kernel_size) % conv2_stride != 0:
continue
conv2_output_size = (conv2_num_kernels, (pool1_output_size[1] - conv2_kernel_size) / conv2_stride + 1)
for pool2_kernel_size in pool2_kernel_size_range:
for pool2_stride in pool2_stride_range:
if (conv2_output_size[1] - pool2_kernel_size) % pool2_stride != 0:
continue
pool2_output_size = (conv2_num_kernels, (conv2_output_size[1] - pool2_kernel_size) / pool2_stride + 1)
possible_size_combinations.append((conv1_kernel_size, conv1_stride, pool1_kernel_size, pool1_stride, conv2_kernel_size, conv2_stride, pool2_kernel_size, pool2_stride))if len(possible_size_combinations) == 0:
raise ValueError('create_models: no possible combination for pool1 given conv1_output_size[1] = ' + str(conv1_output_size[1]) + '; pool1_kernel_size_ranges = ' + str(pool1_kernel_size_ranges) + '; pool1_stride_ranges = ' + str(pool1_stride_ranges))
conv1_kernel_size, conv1_stride, pool1_kernel_size, pool1_stride, conv2_kernel_size, conv2_stride, pool2_kernel_size, pool2_stride = random.choice(possible_size_combinations)# print('create_models: ranges[\'fcs_hidden_size\'] =', ranges['fcs_hidden_size'])
# print('create_models: list(range(*ranges[\'fcs_hidden_size\'])) =', list(range(*ranges['fcs_hidden_size'])))
fcs_hidden_size = random.choice(list(range(*ranges['fcs_hidden_size'])))
fcs_num_hidden_layers = random.choice(list(range(*ranges['fcs_num_hidden_layers'])))
fcs_dropout = random.uniform(*ranges['fcs_dropout'])
# Randomly choose training hyperparameters from ranges.
cost_function = random.choice(ranges['cost_function'])
optimizer = random.choice(ranges['optimizer'])
if optimizer == 'SGD':
momentum = random.uniform(*ranges['momentum'])
learning_rate = random.uniform(*ranges['learning_rate_sgd'])
elif optimizer == 'Adam':
momentum = None
learning_rate = random.uniform(*ranges['learning_rate_adam'])
hyperparameters = {
'input_size': input_size,
'output_size': ranges['output_size'],
'batch_norm': batch_norm,'use_pooling': use_pooling,
'pooling_method': ranges['pooling_method'],
'conv1_kernel_size': conv1_kernel_size,
'conv1_num_kernels': conv1_num_kernels,
'conv1_stride': conv1_stride,
'conv1_dropout': conv1_dropout,
'pool1_kernel_size': pool1_kernel_size,
'pool1_stride': pool1_stride,
'conv2_kernel_size': conv2_kernel_size,
'conv2_num_kernels': conv2_num_kernels,
'conv2_stride': conv2_stride,
'conv2_dropout': conv2_dropout,
'pool2_kernel_size': pool2_kernel_size,
'pool2_stride': pool2_stride,
'fcs_hidden_size': fcs_hidden_size,
'fcs_num_hidden_layers': fcs_num_hidden_layers,
'fcs_dropout': fcs_dropout,
'cost_function': cost_function,
'optimizer': optimizer,
'learning_rate': learning_rate,
'momentum': momentum,
}
return hyperparametersdef create_models(num_networks, hyperparameter_ranges_file):
identifier = datetime.datetime.now().strftime('%Y%m%d%H%M%S')
data_is_target_list = [0]
num_scat_list = [1, 2, 3]
batch_size_list = [32]
data_noise_gaussian_list = [0, 1]
#dropout_input_list = [0, 0.1, 0.2]
#dropout_list = [0, 0.1, 0.2, 0.3, 0.4, 0.5]
weight_decay_list = [0]
for count in range(num_networks):
data_is_target = random.choice(data_is_target_list)
n_scat = random.choice(num_scat_list)
bs = random.choice(batch_size_list)
data_noise_gaussian = random.choice(data_noise_gaussian_list)
#dropout_input = random.choice(dropout_input_list)
weight_decay = random.choice(weight_decay_list)
# get params
model_params = choose_hyperparameters_from_file(hyperparameter_ranges_file)
# set other params
model_params['data_is_target'] = data_is_target
home = os.path.expanduser('~')
model_params['data_train'] = os.path.join(home,'Downloads', '20180402_L74_70mm', 'train_' + str(n_scat) + '.h5')
model_params['data_val'] = os.path.join(home, 'Downloads', '20180402_L74_70mm', 'val_' + str(n_scat) + '.h5')
model_params['batch_size'] = bs
model_params['data_noise_gaussian'] = data_noise_gaussian
#model_params['dropout_input'] = dropout_input
model_params['weight_decay'] = weight_decay
model_params['patience'] = 20
model_params['cuda'] = 1
model_params['save_initial'] = 0
k_list = [3, 4, 5]for k in k_list:
model_params['k'] = k
model_params['save_dir'] = os.path.join('DNNs', identifier + '_' + str(count+1) + '_created', 'k_' + str(k))
# print(model_params['save_dir'])
ensure_dir(model_params['save_dir'])
save_model_params(os.path.join(model_params['save_dir'], 'model_params.txt'), model_params)
return identifierdef main():
# parse input arguments
parser = argparse.ArgumentParser()
parser.add_argument('num_networks', type=int, help='The number of networks to train.')
parser.add_argument('hyperparameter_ranges_file', type=str, help='The number of networks to train.')
args = parser.parse_args()
num_networks = args.num_networks
hyperparameter_ranges_file = args.hyperparameter_ranges_file
return create_models(num_networks, hyperparameter_ranges_file)
if __name__ == '__main__':
main()

Here is one example of my hyperparams range file, hyperparam_ranges.json:

{
"input_size": 65,
"output_size": 130,
"batch_norm": [0, 1],"use_pooling": [0, 1],
"pooling_method": "max",
"conv1_kernel_size": [6, 32],
"conv1_num_kernels": [8, 51],
"conv1_stride": [1],
"conv1_dropout": [0, 0],
"pool1_kernel_size": [2, 3],
"pool1_stride": [2],
"conv2_kernel_size": [2, 20],
"conv2_num_kernels": [2, 40],
"conv2_stride": [1],
"conv2_dropout": [0, 1],
"pool2_kernel_size": [2],
"pool2_stride": [2],
"fcs_hidden_size": [25, 520],
"fcs_num_hidden_layers": [1, 4],
"fcs_dropout": [0, 1],
"cost_function": ["MSE"],"optimizer": ["Adam", "SGD"],
"momentum": [0.8, 1],
"learning_rate_adam": [0, 0.0002],
"learning_rate_sgd": [0, 0.02]
}

Lastly, we train the created random models with a standard training script:

# train.py
# Usage: python train.py "*"
# python train.py 201807201428
import torch
import os
import numpy as np
import time
import argparse
import glob
import warnings
from pprint import pprint
from utils import read_model_params, save_model_params, ensure_dir, add_suffix_to_path
from dataloader import ApertureDataset
from lenet import LeNet
from logger import Logger
from trainer import Trainer
def train(identifier):
models = glob.glob(os.path.join('DNNs', str(identifier) + '_created'))
for model_folder in models:
ks = glob.glob(os.path.join(model_folder, 'k_*'))
for k in ks:
model_params_path = k + '/model_params.txt'
print('train.py: training model', model_params_path, 'with hyperparams')
# load model params.
model_params = read_model_params(model_params_path)
# print model and training parameters.
pprint(model_params)
# cuda flag
using_cuda = model_params['cuda'] and torch.cuda.is_available()
if using_cuda == True:
print('train.py: Using ' + str(torch.cuda.get_device_name(0)))
else:
warnings.warn('train.py: Not using CUDA')
# Load primary training data
num_samples = 10 ** 5
dat_train = ApertureDataset(model_params['data_train'], num_samples, model_params['k'], model_params['data_is_target'])
loader_train = torch.utils.data.DataLoader(dat_train, batch_size=model_params['batch_size'], shuffle=True, num_workers=1)
# Load secondary training data - used to evaluate training loss after every epoch
num_samples = 10 ** 4
dat_train2 = ApertureDataset(model_params['data_train'], num_samples, model_params['k'], model_params['data_is_target'])
loader_train_eval = torch.utils.data.DataLoader(dat_train2, batch_size=model_params['batch_size'], shuffle=False, num_workers=1)
# Load validation data - used to evaluate validation loss after every epoch
num_samples = 10 ** 4
dat_val = ApertureDataset(model_params['data_val'], num_samples, model_params['k'], model_params['data_is_target'])
loader_val = torch.utils.data.DataLoader(dat_val, batch_size=model_params['batch_size'], shuffle=False, num_workers=1)
# create model
model = LeNet(model_params['input_size'],
model_params['output_size'],
model_params['batch_norm'],model_params['use_pooling'],
model_params['pooling_method'],
model_params['conv1_kernel_size'],
model_params['conv1_num_kernels'],
model_params['conv1_stride'],
model_params['conv1_dropout'],
model_params['pool1_kernel_size'],
model_params['pool1_stride'],
model_params['conv2_kernel_size'],
model_params['conv2_num_kernels'],
model_params['conv2_stride'],
model_params['conv2_dropout'],
model_params['pool2_kernel_size'],
model_params['pool2_stride'],
model_params['fcs_hidden_size'],
model_params['fcs_num_hidden_layers'],
model_params['fcs_dropout'])
if using_cuda == True:
model.cuda()
# save initial weights
if model_params['save_initial'] and model_params['save_dir']:
suffix = '_initial'
path = add_suffix_to_path(model_parmas['save_dir'], suffix)
print('Saving model weights in : ' + path)
ensure_dir(path)
torch.save(model.state_dict(), os.path.join(path, 'model.dat'))
save_model_params(os.path.join(path, 'model_params.txt'), model_params)
# loss
loss = torch.nn.MSELoss()
# optimizer
if model_params['optimizer'] == 'Adam':
optimizer = torch.optim.Adam(model.parameters(), lr=model_params['learning_rate'], weight_decay=model_params['weight_decay'])
elif model_params['optimizer'] == 'SGD':
optimizer = torch.optim.SGD(model.parameters(), lr=model_params['learning_rate'], momentum=model_params['momentum'], weight_decay=model_params['weight_decay'])
else:
raise ValueError('model_params[\'optimizer\'] must be either Adam or SGD. Got ' + model_params['optimizer'])
logger = Logger()trainer = Trainer(model=model,
loss=loss,
optimizer=optimizer,
patience=model_params['patience'],
loader_train=loader_train,
loader_train_eval=loader_train_eval,
loader_val=loader_val,
cuda=using_cuda,
logger=logger,
data_noise_gaussian=model_params['data_noise_gaussian'],
save_dir=model_params['save_dir'])
# run training
trainer.train()
os.rename(model_folder, model_folder.replace('_created', '_trained'))def main():
parser = argparse.ArgumentParser()
parser.add_argument('identifier', help='Option to load model params from a file. Values in this file take precedence.')
args = parser.parse_args()
identifier = args.identifier
train(identifier)
if __name__ == '__main__':
main()

What do you think? Please share your comments and feedback!

--

--

Zhanwen Chen
Repro Repo

A PhD student interested in learning from data.