Tips to rapidly scale Deep Learning research.

3 min readMay 11, 2020


Deep learning research done wrong can be extremely expensive in terms of computation and time. In addition, it could also hurt the environment. A poorly structured deep learning project might involve:

  1. Write some code.
  2. Start training.
  3. Wait for something to fail.
  4. Fix and repeat.

This process slows down progress and wastes resources. However, this process is often hard to break as a developer because it feels productive. The dopamine hit from solving an issue is addictive and the time spent on waiting for the model to finish training feels like work. The way around this is to designing a more efficient process.

The following practical time-tested tips will decrease ramp up time for new projects and enable projects to scale across a large team efficiently.

1. Use a battle-tested boilerplate.

Investing in good boilerplate code is essential in scaling across multiple projects and teams. Boilerplate code reduces developer overhead by allowing them to focus on the tasks that provide the most value while removing some of the expensive pitfalls that one would commonly face. Time-wasters like duplicating code, refactoring a project or debugging are tasks that should be reduced as much as possible. Most of the sanity-tests discussed below can be incorporated as unit tests in the boilerplate code, which will save a ton of time and resources as the company/team scales.

2. Load and save one checkpoint before training.

There’s nothing worse than training a model to convergence only to find that there’s a bug in the model persistence code. Testing the checkpointing code after a few iterations of training reduces the likelihood of running into such errors.
While doing this, there are a few things to check.

  1. All the variables needed to restart training (epoch, learning rate, optimizer state, etc.) are saved.
  2. The right checkpointing scheme is used (e.g. lowest loss, latest checkpoint, min top-k acc, etc.)
  3. The model is exported to CPU mode to prevent tightly coupling the model to a specific hardware configuration.
  4. Model and weights are not tightly coupled.

3. Validation before training.

In a similar vein to the point above on checkpoints, its a good idea to check that the validation code works and produces sensible results before running the training loop. This ensures that your program doesn’t crash after the first epoch and that the statistics you’re tracking are correct.

Another good thing to check is that the randomly initialized model produces near-chance probabilities. In the case of a 10-class classification task, the Cross-Entropy loss value should be around ln(10) ~ 2.3026. This ensure that you’re tracking an unbiased metric.

4. Overfit to a single batch of data.

Overfitting at the start ensures that that the data is well-conditioned, the model is of sufficient capacity and the optimization code is correct. Its often hard to spot errors after a few iterations on a large training set and the errors can be quite subtle. Overfitting once before training allows one to be confident about interpreting the curves during training.

5. Set a consistent random seed.

There are several components in a training pipeline that rely on a random number generator. These may include weight initialization, optimization, data augmentation, training curriculum, etc. In order to enable reproducibility and fair benchmarking, its a good idea to explicitly set the random seed to some constant value that’s persisted with the checkpoint. Note that GPU or asynchronous execution can be non-deterministic and should be accounted for.

6. Config files instead of argument parsing.

Quite a few popular deep learning projects use the arg parser to input experimental options. This makes it a lot harder to track and reproduce experiments. A much better option would be to stick to configuration files and read the options from them. Use enums wherever possible and try to limit the use of strings to names and paths.

7. Stop experiments early.

Use Tensorboard or equivalent visualization dashboard. If an experiment is performing worse than a previous benchmark, in most cases, that’s sufficient signal to stop the experiment. This speeds up iteration time and allows you to focus on creating more value. If applicable, it also helps to use a conservative early stopping criterion that ends the experiment when there is no more progress.




Building a neural layer for reliable real-world deep learning