Using an appropriate scale to pick hyperparameters

Bibek Shah Shankhar
Optimizing Hyperparameters
2 min readMay 30, 2020

In the last aticle you saw how random sampling, over the range of hyperparameters, can enable you to more efficiently search the space of hyperparameters. Yet it turns out that random sampling doesn’t imply uniform sampling at random, over the spectrum of valid values.Instead, it’s important to pick the appropriate scale on which to explore the hyperparamaters.

Let’s look at one case. Say about your search for the alpha hyperparameter, the rate of learning. And let’s say you suspect that 0.0001 may be on the low end, or perhaps it might be as high as 1. Now if you draw the number line from 0.0001 to 1, and assign values along this number line randomly at random. Well about 90 percent of your sample values would be between 0.1 and 1.

So you use 90 percent of the resources to search between 0.1 and 1 and search between 0.0001 and 0.1 only 10 percent of the resources. That doesn’t sound right. Instead, the search for hyperparameters on a log scale seems more sensible. So you will have 0.0001 here instead of using a linear scale, and then 0.001, 0.01, 0.1 and then 1.And then, you sample on this sort of logarithmic scale randomly, at random. Now you’ve got more resources to scan between 0.0001 and 0.001, and between 0.001 and 0.01, and so forth.

And the way you execute this, in Python, is

r = -4 * np.random.rand()  # -4 <= r <= 0, uniformly at random

And then a randomly chosen value of alpha as

alpha = np.exp(10, r) # 10e-4 <= alpha <= 1.0

So r will be a random number between -4 and 0, after this first line. And Alpha will be between 10^4 and 10^0. In a more general case, if you try to sample on the log scale from 10^a to 10^b. Then, what you do is measure r uniformly, at random, between a and b. Thus r would be between -4 and 0 in this case. And on your randomly sampled hyperparameter value, you can set alpha, as 10r. So just to recapture, to sample on the log scale, take the low value, take logs to figure out what is a. Take the high value and take a log to decide what is b. So now you are trying to sample on a log scale, from 10a to 10b. So you set r at random, uniformly, between a and b. And then the hyperparameter is set to 10r. So on this logarithmic scale this is how you implement sampling.

Credit: Andrew Ng

--

--

Bibek Shah Shankhar
Optimizing Hyperparameters

I post articles on Data Science | Machine Learning | Deep Learning . Connect with me on Linkedln: https://www.linkedin.com/in/bibek-shah-shankhar/