How Does Overclocking CPU/GPU Affect Deep Learning Training Speed?

Tim Yee
3 min readOct 26, 2018

Have you ever wondered what happens if you overclock your rig for deep learning? I have. So I decided to run a simple benchmark on this code. Testing was done on a CNN on MNIST dataset using Keras run in Jupyter notebook.

Hardware Baseline:

  • CPU: Intel i7 8700K — 4.3GHz all-core turbo
  • RAM: G.SKILL 48GB 3000MHz
  • GPU: GTX 1060 3GB — 1,900MHz Core Clock, 3,800MHz Memory Clock

Software Used For GPU Tuning:

Python Packages

  • Keras=2.2.4
  • Tensorflow-gpu=1.11

Hyper-parameter tuning:

  • GPU core clock: -300MHz, -200MHz, -100MHz, +0, +100MHz, +200MHz
  • GPU memory clock: -400MHz, -300MHz, -200MHz, -100MHz, +0, +100MHz
  • CPU: 4.3GHz — 4.8GHz in 100MHz increments (no AVX offset)

Testing Methodology:

  • Link to the Jupyter Notebook here
  • After 5 runs at each increment, the high and lows were dropped and the mean and standard deviation of the remaining 3 runs were recorded
  • GPU runs at manufacturer's settings during CPU testing
  • CPU runs at 4.8GHz during GPU testing
  • GPU memory is 3,800MHz (default) during GPU core testing
  • GPU core runs at ~2,000MHz (+100MHz) during GPU memory testing
  • Only the CNN portion is timed
  • Verbosity is set to silent
GTX 1060 3GB Underclock Lower Limit

Performance Charts:

All measurements are normalized

The Verdict:

GPU clock and memory frequencies DO affect neural network training time! However, the results are lackluster — an overall 5.15% when running the CPU at 4.8GHz(+500MHz), the GPU Core clock at 2000MHz(+100MHz), and the GPU Memory clock at 4,000MHz(+200MHz). Each variable contributed to the overall performance increase, but a 5.15% overall reduction in training time is hardly worth writing home about.

FAQs:

Q: Should you overclock your CPU to reduce Neural Network training time?

A: Probably not. The performance improvement is unremarkable.

Q: Were you running GPU version of Tensorflow backend?

A: Yes, 100% positive.

Q: What made you drop the lowest, highest and average the middle 3 runs?

A: This method neutralizes system overhead and accounts for the effect of outliers from run to run.

Q: Why is verbosity silent?

A: I would imagine that keeping output to a minimum improves overall run-time performance. Verbosity is essentially system overhead (printing output is primarily limited to the CPU. To make matters worse, print function is limited to a single CPU)

EDITS

(11/29/2018 update): I decided to re-test the entire benchmark using a more methodical approach.

Since the CNN was run using float32, I plan on performing a similar test comparing float16 to float32 using an RTX 2070 in the near future. Be on the look out for that!

Additionally, my testing consisted of only CNN training and results may be different from Neural Network architecture to architecture (i.e. LSTMs). Something I plan on investigating soon~

If you enjoyed this, please let me know! If you have any questions or comments please leave them below.

--

--

Tim Yee

Software Engineering | Machine Learning | Data Science