Great summary.

1 min readMay 29, 2017

Great summary. Thanks for sharing. It’s a similar path to the one I’ve been down recently and I’m currently in the middle of benchmarking.

Instead of buying new gear, I sourced and reused old data centre hardware and purchased two new 1080 8gb GPUs. My spend has been higher at approx $US2600, however, I’ve been able to get Xeon 2600s that allow the two GPUs to fully utilise their x16 PCIe slot. Pipelining the data into the GPUs is certainly a challenge. For datasets that don’t reside in memory (or require dynamic augmentation) my first lot of numbers indicate that my GPUs are spending 20% of their time idle. As I’ve only just reached this point in the last few days, and I’ve only had the chance to test a couple of models, this idle time will undoubtedly increase/decrease with code optimisation and different datasets/models.

What has become very clear to me is that deep learning is not just about tensors flowing through the model, it’s also about the tensors flowing through the hardware.

Written by Norman Heckscher