CPU Maximum Power for $1,000 (non desktop)
Laurae: This post is about what Intel CPU to look for if you want a powerful server/workstation. This is an insight from my “IT background” I do not even have (I hold over 40+ IT certifications, duh). If you do not know what to look for, ask on Reddit on how to look for “cheap Intel Xeons Engineering Samples” and their caveats (if there are — because there could be). The post was originally at Kaggle.
I am buying a new computer.
I would appreciate an advice from more experienced competitors on whether I would be able to draw benefits from E5–2678 v3 (12-core 2.5GHz) type of processor when compared to i7–6850K (6-core 3.6 GHz) type of processor…considering the rest of the system being approximately equal.
With a E5–2678 v3 you should see minor improvements over an i7 6850K (maybe 20–30% performance better?). Generation differences + clock speed, for parallelized tasks, would make 1 core of i7 6850k approximately equal to 1.4–1.6 core of E5–2678 v3. If you use well parallelized algorithms, it is worth the cost increase and the speed improvement. If you don’t, it will be a waste of money (and might be slower than what you expect).
For the price ($700-$3,000 now), I would recommend one or even two E5–2698 v4 CPUs engineering samples. That would be about $1,400 for 2x 20 cores at 2.2–3.6 GHz (or $700 for 20 cores). You may also try E5–2673 v4 qualification samples, one for about $1,000 (they also have 20 cores). E5–2673 v4 (QS) is one of the most powerful Intel CPU on the market, as they have increased clock rate versus their official releases (under another name), while costing only a fraction of the official release.
I use (not for kaggling) 2x E5–2690 v2 (2x 10 cores) and an i7–3930K (6 cores), and the performance difference is massive there when running (near-perfectly) parallelized tasks (up to 3x difference, like segregated xgboost training). However, my E5–2690 v2 is slower than a i7 6850K.
What you should think about both CPUs you compare:
- Singlethreaded performance for programs which cannot run multithreaded (like KNNs in scikit-learn)
- Linearity of multithreading performance for programs (do you hit diminishing returns?)
- Is heavily parallelized threading working properly on the programs you may want to use? (xgboost sometimes has issues on multi-CPU setups)
- Do you work with large enough data to not have parallelism spawning overhead issues? (for instance, on small data sets for xgboost, one single core can be 10x faster than a 20 thread xgboost)
Here is an example of multithreading performance in xgboost for a customized Bosch data set (1M rows, 500 features, pre-cached in memory), and notice the non-linearity (and going over the amount of physical cores for threading helps):
Note: if you intend to create virtual machines and allocate the most cores you can for your virtual environment, you may hit licensing issues with 16 cores (8 cores + hyperthreading).
If you want to test if it is worth switching to more threads instead of higher clock rate (by spending a bit of money), you can test some benchmarks on AWS in a C4 instance. It uses E5–2666 v3 CPUs (12 cores, 2.6–3.3 GHz), you would then just have to limit the threads used to 12 or 24. If you do not own a i7–6850K, you can look for E5–1650v3 servers and multiply the CPU benchmark results by 1.06–1.10 to approximately get the result of an i7–6850K. What will be optimal is the value you can see in adding more threads and the single threading performance, and only you can define it (is a 2X shorter learning speed worth spending 2X more? etc.).
If you intend just to have a powerful workstation, a E5–2673 v4 is literally overkill but for kaggling / machine learning it may be useful (but not for neural networks which requires GPU to speed up significantly). Even a i7 6850K is great already.