My colleague Lavanya ran a large hyperparameter sweep on a Kaggle simpsons dataset in colab here. She ran a large search with the intention of finding the best model for the data. In the process of running the sweep she created a lot of hyperparameter data, and I was wondering if I could find useful insights in it.
Here’s a parallel co-ordinates plot visualizing the results of the hyperparameter search. As you can see she tried a lot of different values for epochs, learning rate, weight decay, optimizers and batch size. In this…


