…r fewer computations since we do not use the entire data-set in each iteration. It also, hopefully, leads to better performance, as the network’s jerkier movements during training should allow it to better avoid local minima, and using only small portions of the dataset should help prevent overfitting.
…mly sampling hyperparameters and weight initializations. Differently from the traditional approach, PBT runs each training asynchronously and evaluates its performance periodically. If a model in the population is under-performing, it will leverage the rest of the model population and replacing itself with a more optimal model. At the same time, PBT explores new hyperparameters by modifying the better model’s hyperparameters, before training is continued.