How Does Overclocking CPU/GPU Affect Deep Learning Training Speed?

3 min readOct 26, 2018

Have you ever wondered what happens if you overclock your rig for deep learning? I have. So I decided to run a simple benchmark on this code. Testing was done on a CNN on MNIST dataset using Keras run in Jupyter notebook.

Hardware Baseline:

CPU: Intel i7 8700K — 4.3GHz all-core turbo
RAM: G.SKILL 48GB 3000MHz
GPU: GTX 1060 3GB — 1,900MHz Core Clock, 3,800MHz Memory Clock

Software Used For GPU Tuning:

MSI Afterburner (GPU)

Python Packages

Keras=2.2.4
Tensorflow-gpu=1.11

Hyper-parameter tuning:

GPU core clock: -300MHz, -200MHz, -100MHz, +0, +100MHz, +200MHz
GPU memory clock: -400MHz, -300MHz, -200MHz, -100MHz, +0, +100MHz
CPU: 4.3GHz — 4.8GHz in 100MHz increments (no AVX offset)

Testing Methodology:

Link to the Jupyter Notebook here
After 5 runs at each increment, the high and lows were dropped and the mean and standard deviation of the remaining 3 runs were recorded
GPU runs at manufacturer's settings during CPU testing
CPU runs at 4.8GHz during GPU testing
GPU memory is 3,800MHz (default) during GPU core testing
GPU core runs at ~2,000MHz (+100MHz) during GPU memory testing
Only the CNN portion is timed
Verbosity is set to silent

Performance Charts:

The Verdict:

GPU clock and memory frequencies DO affect neural network training time! However, the results are lackluster — an overall 5.15% when running the CPU at 4.8GHz(+500MHz), the GPU Core clock at 2000MHz(+100MHz), and the GPU Memory clock at 4,000MHz(+200MHz). Each variable contributed to the overall performance increase, but a 5.15% overall reduction in training time is hardly worth writing home about.

FAQs:

Q: Should you overclock your CPU to reduce Neural Network training time?

A: Probably not. The performance improvement is unremarkable.

Q: Were you running GPU version of Tensorflow backend?

A: Yes, 100% positive.

Q: What made you drop the lowest, highest and average the middle 3 runs?

A: This method neutralizes system overhead and accounts for the effect of outliers from run to run.

Q: Why is verbosity silent?

A: I would imagine that keeping output to a minimum improves overall run-time performance. Verbosity is essentially system overhead (printing output is primarily limited to the CPU. To make matters worse, print function is limited to a single CPU)

EDITS

(11/29/2018 update): I decided to re-test the entire benchmark using a more methodical approach.

Since the CNN was run using float32, I plan on performing a similar test comparing float16 to float32 using an RTX 2070 in the near future. Be on the look out for that!

Additionally, my testing consisted of only CNN training and results may be different from Neural Network architecture to architecture (i.e. LSTMs). Something I plan on investigating soon~

If you enjoyed this, please let me know! If you have any questions or comments please leave them below.

How Does Overclocking CPU/GPU Affect Deep Learning Training Speed?

EDITS

Written by Tim Yee