Anatoly Alekseev
Sep 7, 2018 · 1 min read

On Dual Xeon E5 2670 (Sandy Bridge EP, 2.6 Ghz, SSE4.2, AVX, 8 cores 16 threads), Win Server 2016, vanilla tf 1.10 your original cifar10_train.py initially had troubles:

2.7 examples/sec; 46.836 sec/batch

utilizing 100% cpu. I guess something is wrong with detecting number of threads/cores.

I had to add

config = tf.ConfigProto(intra_op_parallelism_threads=5, inter_op_parallelism_threads=10, allow_soft_placement=True, device_count={‘CPU’: 2})
session = tf.Session(config=config)
tf.keras.backend.set_session(session)

to its beginning to start getting at least 261.6 examples/sec; 0.489 sec/batch

intra_op_parallelism_threads setting was what’s mattered the most.

Still not sure about optimal settings for that hardware configuration.

On my laptop i7–7700HQ (Kaby Lake, 2.8 Ghz, AVX2,FMA3, 4 cores 8 threads), Win 10, tf-gpu 1.10 bult vs cuda 9.2 cudnn 7.2 with avx support, Gtx 1050Ti mobile, it was blinking either ~5400 or ~3500 examples/sec.

    Anatoly Alekseev

    Written by