<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:cc="http://cyber.law.harvard.edu/rss/creativeCommonsRssModule.html">
    <channel>
        <title><![CDATA[Data Science &amp; Design - Medium]]></title>
        <description><![CDATA[All about Data Science, Machine Learning, and Design. Also, lot of things about Statistics, Data Visualization, Benchmarking, and funny stuff. - Medium]]></description>
        <link>https://medium.com/data-design?source=rss----fe51a1842648---4</link>
        <image>
            <url>https://cdn-images-1.medium.com/proxy/1*TGH72Nnw24QL3iV9IOm4VA.png</url>
            <title>Data Science &amp;amp; Design - Medium</title>
            <link>https://medium.com/data-design?source=rss----fe51a1842648---4</link>
        </image>
        <generator>Medium</generator>
        <lastBuildDate>Mon, 25 May 2026 22:04:08 GMT</lastBuildDate>
        <atom:link href="https://medium.com/feed/data-design" rel="self" type="application/rss+xml"/>
        <webMaster><![CDATA[yourfriends@medium.com]]></webMaster>
        <atom:link href="http://medium.superfeedr.com" rel="hub"/>
        <item>
            <title><![CDATA[xgboost GPU performance on low-end GPU vs high-end CPU]]></title>
            <link>https://medium.com/data-design/xgboost-gpu-performance-on-low-end-gpu-vs-high-end-cpu-a7bc5fcd425b?source=rss----fe51a1842648---4</link>
            <guid isPermaLink="false">https://medium.com/p/a7bc5fcd425b</guid>
            <category><![CDATA[machine-learning]]></category>
            <category><![CDATA[benchmark]]></category>
            <dc:creator><![CDATA[Laurae]]></dc:creator>
            <pubDate>Sun, 30 Dec 2018 14:01:00 GMT</pubDate>
            <atom:updated>2018-12-30T14:01:00.622Z</atom:updated>
            <content:encoded><![CDATA[<figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*pQMbwCbOuyiQ9gdKed6ebg.jpeg" /></figure><p>xgboost CPU with fast histogram is extremely fast compared to old school methods such as exact histogram.</p><p><strong>How well does xgboost with very high-end CPU fare against a low-end GPU?</strong> Let’s find out, in a very unfair comparison.</p><p><a href="https://xgboost.readthedocs.io/en/latest/gpu/">GPU xgboost was implemented since last year to provide higher performance.</a> Let’s see how much better here!</p><p>Note: to run the benchmark on GPU, you will need an NVIDIA GPU. There are no workarounds to that. NCCL is also mandatory for multi-GPU training.</p><p>CPUs vs CPUs: <a href="https://medium.com/data-design/exact-xgboost-and-fast-histogram-xgboost-training-speed-comparison-17f95cee68b5">previous results</a></p><p>Here is a <strong>summary table of contents</strong> in case you are lost.</p><p><strong>Content</strong>:</p><ol><li>Potential criticism of using GPU xgboost</li><li>Hardware &amp; Software</li><li>What do we benchmark?</li><li>Benchmark results run by hand</li><li>More specific benchmark results</li><li>More about RAM usage</li></ol><figure><img alt="" src="https://cdn-images-1.medium.com/max/789/1*ob3aEEQJ3vXqM0VpYRpbog.png" /></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/967/1*6yjSnyg8rRbSBIxFJirRmQ.png" /></figure><h3>Potential criticism of using GPU xgboost</h3><p>You may have many criticism why (not) using GPU xgboost…</p><ul><li><strong><em>GPU is not providing reproducible results</em></strong>: this is actually the truth in most cases. <strong>xgboost GPU does not provide reproducible results.</strong></li></ul><figure><img alt="" src="https://cdn-images-1.medium.com/max/702/1*vFVaVDfbmleR7fblT1c93Q.png" /><figcaption>Logloss of xgboost models for the benchmark run by hand for 12.5 million rows x 100 features. GPU xgboost seesm to provide a random best logloss. Only CPU xgboost provides reproducible results.</figcaption></figure><ul><li><strong><em>GPU have less RAM than CPU</em></strong>: sorry, just purchase the expensive Titan / Quadro / Volta cards. <strong>Your business may thank you later</strong> (or fire you for reckless purchase orders).</li></ul><figure><img alt="" src="https://cdn-images-1.medium.com/max/600/1*PXx6azBINKH2DEsGB9nJAw.png" /></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/800/1*lcqLhzNfq5JTRcZNfZGi4Q.png" /><figcaption>NVIDIA RTX meme: this is not professional</figcaption></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/270/1*VF5cnbTNCRRFau5n6rT-ow.png" /><figcaption>NVIDIA DGX-2 refresh ($490K price tag on a 36 month lease): 32GB RAM per GPU!!!</figcaption></figure><ul><li><strong><em>You can’t fit all data in GPU</em></strong>: sorry, but first you need to know you can use multiple GPUs, and second if the software you use is properly coded (and the algorithm behind allows it), the memory is <strong>shared across GPUs</strong> thanks to NCCL (distributed computing).</li></ul><figure><img alt="" src="https://cdn-images-1.medium.com/max/472/0*lJ2WDC11QDFnYuPe.png" /><figcaption>NCCL example: <a href="https://devblogs.nvidia.com/fast-multi-gpu-collectives-nccl/">https://devblogs.nvidia.com/fast-multi-gpu-collectives-nccl/</a></figcaption></figure><ul><li><strong><em>Machine learning on GPU is good only for “deep learning”</em></strong>: sorry, but this is plain wrong. One could say “deep learning is not yet at the level of Johnny Depp”. <strong>Small refresher</strong> for you who think “deep learning &gt; machine learning”:</li></ul><figure><img alt="" src="https://cdn-images-1.medium.com/max/750/0*k_qgPicuesMQOPTL.jpg" /></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/991/1*3N4V_Y-s2DuetzBxN7v_3g.png" /><figcaption>Deep learning is just a very small subset of machine learning. LOL !!!!!!!!!!!!</figcaption></figure><ul><li><strong><em>Mutlple GPUs do not scale. Just see how poor the performance using SLI on games!</em></strong>: you are comparing oranges and apples. <strong>This is the same as if you were comparing Geekbench on Android vs iOS (or Windows vs macOS)</strong>: a pure total non sense.</li></ul><figure><img alt="" src="https://cdn-images-1.medium.com/max/966/1*P7r246rEFOLanyo47MSk9Q.png" /></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/863/1*sxyIodoNFUWIvHUfFfMTzg.png" /></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/859/1*OWA3xhc4pkmLVa0vaJ-P_w.png" /><figcaption>Geekbench 4 multithreaded test: Same CPU spotted with very varying results (Intel Xeon W-2191B: 47240 on iMac Pro, 35536 on Windows?)</figcaption></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/320/1*UKr4LH2-WPnOso37kPivpQ.gif" /><figcaption>Singlethreading meme</figcaption></figure><ul><li><strong><em>GPU does not work well when it is too fast</em></strong>: if you cool it <strong>sufficiently</strong>, it will work well.</li></ul><figure><img alt="" src="https://cdn-images-1.medium.com/max/997/1*5WuGZm8dENRUH5jtg_GlvA.png" /></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/960/1*xE3hHAygDriPricNn-NZnw.png" /><figcaption>GPU toaster: not giving enough cooling to your toaster will make it die.</figcaption></figure><ul><li><strong><em>Why should I use xgboost on GPU when deep learning is always the best tool?</em></strong>: a tool is just a way to achieve an objective to meet a real need. In most cases, a neural network does not work well on tabular and business data. <strong>It’s similar to using a Bazooka to kill a bee!</strong></li></ul><figure><img alt="" src="https://cdn-images-1.medium.com/max/630/0*2puImMIZ8u9Dyb3L.png" /><figcaption>Are you deeper than deep learning?</figcaption></figure><ul><li><strong><em>I can just use a cluster and do the work faster than everyone else in the world!</em></strong>: not really if there is <strong>no scalability</strong> and someone found the gem for a singlethreaded scenario.</li></ul><figure><img alt="" src="https://cdn-images-1.medium.com/max/862/1*ojKbTE1QSEyjAjngdrJn3w.png" /><figcaption>Optimize first the inner most element, until you can’t anymore: switch to the outer element, and do it again over and over.</figcaption></figure><ul><li><strong><em>GPU is always the fastest tool for everything! No need to test!</em></strong>: sorry, it depends on the <strong>use case</strong>.</li></ul><figure><img alt="" src="https://cdn-images-1.medium.com/max/656/1*Xkysh8H_o99qCdBDmcdVFw.png" /><figcaption>Do you even need a nuclear radiation protection for your smartphone? Better get Helium protection first.</figcaption></figure><h3>Hardware &amp; Software</h3><p>To compare xgboost CPU and GPU, we will be using the following <strong>unfair hardware</strong> worth over $15K:</p><ul><li>CPU: <strong>Dual Intel Xeon Gold 6154</strong> (2x 18 cores / 36 threads, 3.7 GHz all turbo)</li><li>RAM: 4x 64GB RAM 2666 MHz (good to go for 80 GBps bandwidth)</li><li>GPU: <strong>4x NVIDIA Quadro P1000 4GB RAM</strong> (very similar to NVIDIA GeForce 1050 4GB RAM, 4 of them is similar to a 1080)</li><li>BIOS: NUMA enabled, Sub NUMA Clustering disabled</li><li>Operating System: Pop!_OS 18.10 (like Ubuntu 18.10)</li><li>R: 3.5.1, compiled with -O3 -march=native</li><li>NVIDIA versions: <strong>CUDA 10.0, NCCL 2.3.7</strong></li><li>xgboost version: a2dc929</li></ul><p>You might wonder, <strong>why comparing a miserable Quadro P1000 with a super high end CPU?</strong> You will find out later.</p><p><em>And yes, the following hardware you see below is slower than you may think (against our server):</em></p><figure><img alt="" src="https://cdn-images-1.medium.com/max/655/1*rdqWsZr-k7Y-zWPOIpR0sA.png" /><figcaption>Beware: dual Xeon E5–2699v4 (44 cores, 88 threads, 2.8 GHz all turbo) is slower overall!</figcaption></figure><p><em>Rationale: </em><strong><em>xgboost fast histogram does not scale well with threads</em></strong><em>. </em><a href="https://medium.com/data-design/benchmarking-new-xgboost-fast-histogram-xgboost-and-the-compiler-story-86f0c5a4bcd3"><em>This was already seen so many times…</em></a></p><h4>Compile xgboost for GPU in R</h4><p>To <strong>compile xgboost in R with GPU support</strong> (and <strong>multi GPU support through NCCL</strong>), we can use a oneliner in R assuming you have the <a href="https://github.com/Laurae2/xgbdl">xgbdl package</a> from myself:</p><pre>xgbdl::xgb.dl(compiler = &quot;gcc&quot;, commit = &quot;a2dc929&quot;, use_avx = FALSE, use_gpu = TRUE, CUDA = list(&quot;/usr/lib/cuda&quot;, &quot;/usr/bin/gcc-6&quot;, &quot;/usr/bin/g++-6&quot;), NCCL = &quot;/usr/lib/x86_64-linux-gnu&quot;)</pre><p><em>Note: AVX option is deprecated. </em><strong><em>Get rid of NCCL if you are using a single GPU.</em></strong><em> We assume you already installed CUDA and NCCL.</em></p><p><strong>CUDA 10 requires gcc (version 6), and NCCL must be pointed to the right folder.</strong> xgb.dl takes all those inputs for you, and perform the work on your behalf so you do not have to do it manually in R.</p><p>Installing xgboost for GPU allows you to keep using CPU. You are not restricted to only using GPU once installing the GPU version (but the CPU version allows you to only use CPU).</p><h4>Monitoring GPUs in Linux</h4><p>Other than using nvidia-smi, you might be interested in <a href="https://github.com/Syllo/nvtop">nvtop</a>:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/967/1*6yjSnyg8rRbSBIxFJirRmQ.png" /><figcaption>nvtop is like htop but for GPUs!</figcaption></figure><p>PuTTY users, please use the following to run nvtop:</p><p>NCURSES_NO_UTF8_ACS=1 nvtop</p><p>Otherwise, you may have funny stuff.</p><p>I could not find something better than that (other than nvidia-smi for more details “at X time”), if you have something interesting for GPU monitoring, feel free to share in the comments (do not recommend something like glances , etc.: using a bazooka to solve a problem is not a solution).</p><h3>What do we benchmark?</h3><p>There is a <strong>very nice script</strong> created by <a href="https://github.com/dmlc/xgboost/blob/74db9757b38e51516289704ba236e14b8454d924/R-package/demo/gpu_accelerated.R">kholitov to benchmark xgboost on GPU</a>. We will adapt it to run on CPU using the following:</p><ul><li>1 billion elements: 10 million rows, 100 columns</li><li>90% of data is used for training (9 million rows)</li><li>10% of data is used for validation (1 million rows)</li><li>500 training iterations</li><li>64 bins</li><li><strong>CPU vs GPU modes: hist vs gpu_hist</strong></li></ul><p>We will use the following script to benchmark xgboost CPU vs GPU.</p><iframe src="" width="0" height="0" frameborder="0" scrolling="no"><a href="https://medium.com/media/9dbb6cd398539e294c67000f01341abc/href">https://medium.com/media/9dbb6cd398539e294c67000f01341abc/href</a></iframe><h3>Benchmark results run by hand</h3><p>The benchmark results run by hand are a bit different than the real benchmark:</p><ul><li>We are using 12.5 million rows instead of 10 million rows and depth 6 only (fits better the 4GB GPU RAM of a Quadro P1000)</li><li>Other hardware is tested (i7–7700 + NVIDIA GeForce 1080, E5–1650 v3)</li></ul><h4>Tradeoff for using GPU</h4><p>There are multiple tradeoffs for using xgboost on GPU:</p><ul><li><strong>GPU models are not reproducible</strong>: you will always get different results. If you are testing for lucky runs, then you will learn to at least run for the expected value (mean/average) over several runs. Or do statistical testing to compare means.</li><li><strong>GPU models are not cleared from memory after being run</strong>: you need to remove the model from memory then run gc() .</li><li><strong>xgboost crashes when using a lower number of threads than the number of available CPUs</strong>: use at least nthread equal to the number of GPUs used.</li><li><strong>xgboost crashes when changing the (number of) GPUs used after training a model on an identical xgb.DMatrix</strong>: remove the dataset and model from memory, run gc() , and reconstruct the needed xgb.DMatrix…</li><li><strong>xgboost GPU crashes for </strong><strong>max_depth &gt;X</strong>: use a maximum depth lower than or equal to X, otherwise you crash xgboost GPU. Rule of thumb I found: do not use more than 12 for approximately 100 features. The maximum depth for crashing seems to be linked to the number of features.</li><li><strong>xgboost cross-validation with GPU crashes after training multiple folds</strong>: more likely you are running out of GPU RAM, then you should get what you want from the models, then delete the models.</li><li><strong>xgboost ignores my hyperparameters</strong>: most likely you are using unavailable hyperparameters for GPU (not every hyperparameter is available with GPU, this is also true for fast histogram actually)</li></ul><p>Example of non-reproducible results:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/702/1*J0Ajr74-391ySk_CmLkhmA.png" /><figcaption>xgboost GPU is not reproducible and you will get different results every time! Only CPU is! — Note the CPU is not reproducible when changing computer: you need the same exact compiler to have reproducible results!</figcaption></figure><h4>Benchmark results run by hand</h4><p>For the benchmark results run by hand (12.5 million rows, 100 features), we have guest hardware provided by <a href="https://medium.com/u/9a0465191627">Miguel Perez Michaus</a>:</p><ul><li>Server 1: i7–7700 + 64GB RAM (4x 16GB RAM)+ NVIDIA GeForce 1080 8GB RAM</li><li>Server 2: E5–1650 v3 + 128GB RAM (4x 32GB RAM)</li></ul><figure><img alt="" src="https://cdn-images-1.medium.com/max/702/1*ikFVIwOg2c4Q_AOoKE5oDQ.png" /><figcaption>xgboost training time for 12,500,000 rows x 100 features, 500 iterations</figcaption></figure><p>Main conclusions to get here:</p><ul><li><strong>xgboost CPU with a very high end CPU (2x Xeon Gold 6154, 3.7 GHz all cores) is slower than xgboost GPU with a low-end GPU (1x Quadro P1000)</strong></li><li><strong>xgboost GPU seems to scale linearly</strong></li><li><strong>4 Quadro P1000 is faster than a single GeForce 1080</strong></li></ul><p>Extra conclusions, for those using CPU:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/667/1*n2qnfkh4hQ0aPzO1J2OWQA.png" /></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/307/1*91IK6xkvDVmS4_QlzCSXkA.png" /></figure><ul><li>2x Xeon Gold 6154 (2x $3,543) gets you a training time of 700 seconds, <strong>25% faster than a i7–7700 (for 2,339% the price) and 20% faster than a E5–1650 v3 (for 1,215% of the price)</strong></li></ul><blockquote>How much should I spend for ML stuff?</blockquote><p><strong>Is it worth to purchase 2x Xeon Gold 6154 when you can purchase a i7–7700 (for 4% the price) to train as fast as possible a single model?</strong> It depends: if you value your time, then yes it is worth. Otherwise, it is a waste of your money (find other use cases to justify the 2x Gold 6154).</p><blockquote>The random guy who says R is only working singlethreaded is wrong</blockquote><p><strong>If we can call you <em>“The Parallelizer”</em>, then you know with a 2x Gold 6154, you can train 72 xgboost fairly quickly at the same time.</strong> This is a godly working use case for your server. <strong>R works extremely well for parallelization</strong>, and it is available by default (and <a href="https://github.com/Laurae2/LauraeParallel">my package LauraeParallel provides load balancing</a> in a functional programming fashion).</p><p>Also, a small note: do not buy a server just to boast. The next image proves it is just a waste of your money if you do not use it for a real task:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*VjSxrPfK4KY3QcPu-8FzIg.png" /><figcaption>htop: Time to boast those “72 high performance threads on the server for super duper fast artificial intelligence machine learning data science crypto (insert more buzzwords)”</figcaption></figure><p><em>Don’t do that unless you want to show something interesting (like purchasing a server and keeping cores unused for instance…)</em></p><h3>More specific benchmark results</h3><p>I suppose you were here for the GPU benchmarks? If we plot the raw data, we may end up with something very wrong at first sight:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/953/1*S_LTGhd_2AGCZ1OFezSNQg.png" /><figcaption>Using no GPU seems so slow!!!!!</figcaption></figure><p>We have to understand several things from the plots, which are specific to our scenario (10M rows, 100 features):</p><ul><li><strong>Using no GPU is significantly slower.</strong></li><li><strong>CPU have negative scalability with a large number of threads</strong>, this is even more visible for larger maximum depths</li><li><strong>More GPU means faster training</strong> (seems correct), but it does not seem to scale linearly (because the charting is actually so wrong visually)</li><li><strong>More CPU threads using GPU is not faster</strong>, it is actually a flat line (in practice this is not exactly true when using too many threads for negative scalability, but for this experiment, we are keeping a flat line)</li></ul><p>To focus on the essentials, we have to invert what we are actually: to get an idea of the speedup of using GPU against using CPU, we have to analyze the speedup against CPU. Three different charts for GPU speedup are provided:</p><ul><li>Without free axes, and all data:</li></ul><figure><img alt="" src="https://cdn-images-1.medium.com/max/953/1*zi63TMTFRTUwAhc1ws4t9A.png" /></figure><ul><li>With free axes</li></ul><figure><img alt="" src="https://cdn-images-1.medium.com/max/953/1*h2AtTqHEO6W4GVAX4i_zNw.png" /></figure><ul><li>With free axes, and restricted from 6 threads on CPU:</li></ul><figure><img alt="" src="https://cdn-images-1.medium.com/max/953/1*lpjSxSRwCLQ2WrEQj2tVMw.png" /></figure><p>Better conclusions can be made using those charts:</p><ul><li><strong>A single GPU provides an excellent speedup</strong> against a small number of threads (5x or more)</li><li><strong>Multiple GPUs provide a very huge speedup</strong> against a small number of threads (up to 20x)</li><li><strong>GPU speedup decreases as the maximum depth increases</strong></li><li><strong>Against peak CPU performance, GPU performance increase remains flat</strong> (but is still an increase in performance)</li><li><strong>Adding more GPUs increases the performance linearly as long as the maximum depth is lower</strong>, otherwise it increases the performance very maginally (see: depth 2 vs depth 12 scaling)</li></ul><p>From this point of view, it is very easy to emit the following conclusion:</p><blockquote>For <strong>small trees</strong> and if reproducibility is not an issue, using a weak GPU is faster than using a monster CPU, as long as the data fits in GPU RAM. Otherwise, using CPU remains the best choice.</blockquote><p>Just reiterating <strong>the hypotheses</strong> in case <strong>for our GPU &gt; CPU conclusion</strong>:</p><ul><li><strong>Small trees</strong> (small maximum depth)</li><li><strong>Not reproducible results</strong></li><li><strong>Data fits in GPU RAM</strong></li><li><strong>Weak GPU &gt; Strong CPU</strong></li></ul><p>If you can live with <strong>non-reproducible results to do cross-validation and compare feature performance</strong>, that’s another story where doing proper <strong>statistics </strong>can help you.</p><h3>More about RAM usage</h3><p>xgboost GPU is pretty smart at using multiple GPUs. By taking our benchmark script, and using a 500K rows x 100 features matrix with 10% as validation (343 MB training set), we get the following script:</p><iframe src="" width="0" height="0" frameborder="0" scrolling="no"><a href="https://medium.com/media/66aeb2e8c658e9824d29650f45d141ec/href">https://medium.com/media/66aeb2e8c658e9824d29650f45d141ec/href</a></iframe><p>Using nvidia-smi , you can exactly pinpoint the GPU RAM usage per process (we have to include the xgboost GPU process which takes 55 additional MB):</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/802/1*tD3hgxZZfOsxJumpCcWHUA.png" /><figcaption>nvidia-smi showing exactly which process takes how many GPU RAM. A fixed 55MB (variable actually depending on GPUs used…) is mandatory in xgboost for GPU with R (might be identical for Python).</figcaption></figure><p>A more complete sample test script is below:</p><iframe src="" width="0" height="0" frameborder="0" scrolling="no"><a href="https://medium.com/media/5ebdfd949d759d7944cc20bbd1379cb6/href">https://medium.com/media/5ebdfd949d759d7944cc20bbd1379cb6/href</a></iframe><h4>GPU RAM when modifying row count</h4><p>We get the following RAM results when using xgboost GPU with a matrix of size 450,000 x 100:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*AmsoapVv9vpWVmO2L7-OwA.png" /><figcaption>How much GPU RAM is used per GPU for 450K row x 100 feature matrix?</figcaption></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*HrSLBJWM_M0EOvkh7z4kFg.png" /><figcaption>How much total GPU RAM is used for 450K row x 100 feature matrix?</figcaption></figure><p>As we can see, <strong>the total GPU RAM used for GPU increases dramatically as the maximum depth increases</strong>:</p><ul><li>The lowest GPU RAM usage is below depth 5 (between 1 and 4)</li><li>The GPU RAM usage spikes from depth 9</li><li>The GPU RAM usage for depth 12 is very high (3 times higher than the lowest RAM usage for our small data)</li></ul><p>Let’s try again, but with a matrix of size 1,000,000 x 100 (763 MB):</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*XDYEZZ8dLrP97MKR27NGyw.png" /><figcaption>How much GPU RAM is used per GPU for 1M row x 100 feature matrix?</figcaption></figure><p>What about 5,000,000 x 100, which is closer to the limit of 4GB (the matrix is of size 3,814.7 MB):</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*QW9rh5BLM17mvZTSwQVkaA.png" /><figcaption>How much GPU RAM is used per GPU for 5M row x 100 feature matrix?</figcaption></figure><p>And 10,000,000 x 100, a matrix of size 7,629.4 MB?:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*qf-WHFwaoHclj_8yLv69Rg.png" /><figcaption>How much GPU RAM is used per GPU for 10M row x 100 feature matrix?</figcaption></figure><p>Step up the game to crash when using 1 GPU, let’s go for 25,000,000 x 100, a matrix of size 19,073.5 MB:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*fVt-uSqesie7OtHe60V0AQ.png" /><figcaption>How much GPU RAM is used per GPU for 25M row x 100 feature matrix?</figcaption></figure><p>We can go ahead to crash when using 2 GPU, with a 50,000,000 x 100 matrix (38,147 MB):</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*wzaKHVaQkNG64hWKj4BK0w.png" /><figcaption>How much GPU RAM is used per GPU for 50M row x 100 feature matrix?</figcaption></figure><p>Do you think a 75,000,000 x 100 matrix (57,220.5 MB) will work for 4 GPU? It crashed!</p><p><strong>GPU RAM seems to increase by a fixed amount when using a larger depth</strong>, which explains why we may have thought our small data GPU RAM explodes with high depth, while it does not for large data.</p><h4>GPU RAM when modifying feature count</h4><p>1,000,000 x 500 (3,814.7 MB): it crashes after depth 10!</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*ROV_6edofvh3PWiU0Q7GzQ.png" /><figcaption>How much GPU RAM is used per GPU for 1M row x 500 feature matrix?</figcaption></figure><p>1,000,000 x 1,000 (7,629.4 MB): it crashes after depth 9!:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*h2JChk61CtzOMlcv7VREnA.png" /><figcaption>How much GPU RAM is used per GPU for 1M row x 1K feature matrix?</figcaption></figure><p>1,000,000 x 2,500 (19,073.5 MB): it crashes after depth 7, and requires at least 2 GPUs!:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*54H7pzaOjoNQRfkB5CNkAg.png" /><figcaption>How much GPU RAM is used per GPU for 1M row x 2.5K feature matrix?</figcaption></figure><p>1,000,000 x 5,000 (38,147 MB): it crashes after depth 6, and requires at least 4 GPUs!:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*XipO2CGIRXrYcuTVMjV41A.png" /><figcaption>How much GPU RAM is used per GPU for 1M row x 5K feature matrix?</figcaption></figure><p><strong>Adding more features (to equal the same number of elements by adding more observations) seem to cost more for xgboost GPU RAM</strong> than adding more observations.</p><h4>Conclusions about GPU RAM usage</h4><p>If we consider the number of elements in a matrix (number of rows x number of features), then we can conclude the following:</p><ul><li><strong>Multiple GPU scales pretty well (nearly linearly) for GPU RAM usage</strong></li><li><strong>More depth means higher GPU RAM usage</strong></li><li><strong>The number of features has a higher weight than the number of observations</strong> (roughly 5-15% more?): it requires more GPU RAM when adding more features than adding more observations for an equal number of elements in the matrix</li><li><strong>Higher number of features increases the risk of crashing xgboost GPU when using a large maximum depth</strong> (a magic number seems to exist)</li><li><strong>There seems to be a formula to predict the GPU RAM required</strong> depending on the <strong>number of observations, the number of features, and the maximum depth</strong>.</li></ul><h3>Mega Conclusion</h3><p>This is a very simple conclusion:</p><blockquote>xgboost GPU is fast. Very fast. As long as it fits in RAM and you do not care about getting reproducible results (and getting crashes).</blockquote><p>To keep getting those epic, stable and reproducible results (or if data is just too big for GPU RAM), keep using the CPU. There’s no real workaround (yet).</p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=a7bc5fcd425b" width="1" height="1" alt=""><hr><p><a href="https://medium.com/data-design/xgboost-gpu-performance-on-low-end-gpu-vs-high-end-cpu-a7bc5fcd425b">xgboost GPU performance on low-end GPU vs high-end CPU</a> was originally published in <a href="https://medium.com/data-design">Data Science &amp; Design</a> on Medium, where people are continuing the conversation by highlighting and responding to this story.</p>]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Investigating xgboost Exact scalability]]></title>
            <link>https://medium.com/data-design/investigating-xgboost-exact-scalability-d562b2b501c0?source=rss----fe51a1842648---4</link>
            <guid isPermaLink="false">https://medium.com/p/d562b2b501c0</guid>
            <category><![CDATA[benchmark]]></category>
            <dc:creator><![CDATA[Laurae]]></dc:creator>
            <pubDate>Mon, 21 May 2018 10:51:39 GMT</pubDate>
            <atom:updated>2018-01-31T23:18:15.096Z</atom:updated>
            <content:encoded><![CDATA[<p>xgboost is a very well known Machine Learning technique based on Gradient Boosted Trees. <strong>The default xgboost is an exact method</strong>, which does not use pinning, and is significantly slower than the histogram-based version (fast histogram).</p><p>Two questions were asked recently about xgboost:</p><blockquote>do I see it right that less threads than physical cores can provide fastes runtime?</blockquote><blockquote>interesting. so without fast hist, it’s a different story?</blockquote><blockquote><strong>It is a whole new story.</strong></blockquote><h3>Context</h3><p>You may have seen my recent blog post (<a href="https://medium.com/@Laurae2/getting-the-most-of-xgboost-and-lightgbm-speed-compiler-cpu-pinning-374c38d82b86"><strong>Getting the most of xgboost and LightGBM speed: Compiler, CPU pinning</strong></a>) which compares two compilers (Visual Studio and MinGW) and CPU pinning/roaming to find the <strong>best software setup configuration</strong> to run xgboost and LightGBM as fast as possible under Windows:</p><ul><li>We should use Visual Studio to compile xgboost and LightGBM</li><li>CPU pinning seems useful for xgboost</li></ul><h3>The question of the day</h3><p>What happens if we look now at xgboost exact? This is the topic of today, and we will go straight to the results as the benchmark setup is identical to the <a href="https://medium.com/@Laurae2/getting-the-most-of-xgboost-and-lightgbm-speed-compiler-cpu-pinning-374c38d82b86">previous blog post</a>.</p><p>The only differences are the following:</p><ul><li>Exact xgboost instead of fast histogram</li><li>Every run is repeated twice</li></ul><h3>Benchmark results: from xgboost Exact to Fast Histogram</h3><p><strong>The Bosch dataset is very large</strong>: 6,000+ seconds is what a user should expect to spend training using the fastest available machine learning libraries at that time.</p><p>Big Data software (Hadoop / Spark etc.) is not what will save you from longer runtimes, what matters here are the algorithm and the performance optimizations. <strong>xgboost Exact can be viewed as approximately 10 times faster than R’s gbm and scikit-learn Gradient Boosting.</strong></p><blockquote>The Bosch dataset is very large for machine learning</blockquote><p>As we directly look at the runtimes, we can find a <strong>large runtime slash when increasing the number of threads</strong>, between respectively:</p><ul><li>Roaming CPU: 6,255.4s (1 thread) to 263.7s (22.7x faster)</li><li>Pinned CPU: 6,487.5s (1 thread) to 266.3s (23.4x faster)</li></ul><figure><img alt="" src="https://cdn-images-1.medium.com/max/1010/1*o48IiWBeb6UU0WBAL9Blfg.png" /><figcaption>xgboost literally takes forever to learn</figcaption></figure><p><strong>xgboost seems to scale very well when adding more and more threads.</strong> We are going to qualify whether it scales very well or not in a more appropriate chart.</p><p>An ample reminder of the <strong>evolution between xgboost Exact and xgboost Fast histogram can be visualized below</strong>:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1009/1*XxpOTQDn93_I5zKr3X4UMw.png" /><figcaption>Approximately 8 threads and 120 seconds?!</figcaption></figure><p>With the histogram technique available first on LightGBM, xgboost Fast Histogram allows to slash the training time from 260 seconds (using a monster 56 threads) to 120 seconds (using a mere 8 threads only): <strong>a 117% performance increase for a similar model performance and using 7 times less computing power seems the best of the world!</strong></p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1012/1*FNV5L8MWsGBjWveK-2zQ9g.png" /><figcaption>LightGBM takes the crown of speed</figcaption></figure><p><strong>LightGBM, the direct “concurrent” to xgboost, is significantly faster and drops the computation time</strong> from 120 seconds to 55 seconds for the same number of threads: a 118% performance increase.</p><h3>xgboost Exact efficiency curve</h3><p>So far we did some talk about the history of xgboost until we arrived to LightGBM. Let’s look at the computing efficiency of xgboost when scaling to more threads:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1011/1*BsGLaim-ddI1Zz55p6gnuQ.png" /><figcaption>Ideally, we should have more than 28x efficiency using 56 threads</figcaption></figure><p>We can notice the following:</p><ul><li><strong>Not only xgboost Exact scales very well</strong> (over 2800% efficiency at 56 threads against a single thread)</li><li><strong>But xgboost Exact still benefit a lot from hyperthreading</strong> (from threads 15 to 28, and 43 to 56, the efficiency keeps increasing)</li><li><strong>And xgboost still manage to scale properly when NUMA issues arise</strong> (we are using a Dual Xeon, therefore managing memory improperly causes slowdowns)</li></ul><p>We can’t say this is true for every situation, but <strong>this chart shows how well xgboost Exact is scaling when using it on large datasets.</strong></p><blockquote><strong>We are counting in hours for a single thread, and in days for R’s gbm and Python’s scikit-learn.</strong></blockquote><h3>Conclusion</h3><p><strong>xgboost Exact scales very well</strong>: this is a good example of a very well made program, tailored to scale on servers. <strong>Although xgboost is a sequential algorithm (Gradient Boosting is sequential, not parallel by nature), it still runs extremely fast</strong> when throwing more and more threads.</p><p><strong>Recent advancements (throughout the last 4 years) slashed the training time from 100,000+ seconds to 50 seconds</strong> (2,000x, <em>“two thousand”</em> performance improvement) thanks to the following:</p><ul><li><strong>Parallelization / multithreading of the sequential task of Gradient Boosting</strong></li><li><strong>Code/Cache optimization of xgboost</strong></li><li><strong>Histogram/sketching idea of LightGBM</strong></li></ul><p>While in addition, <strong>improving then maintaining the high performance of the original machine learning algorithms</strong>. And also providing a proper and stable way to the <strong>industrialization of the algorithms</strong> (<strong><em>H2O still excels at this task</em></strong>).</p><p>The next part, if possible, will test LightGBM on dense data, as outlined <a href="https://github.com/Microsoft/LightGBM/issues/1225">here in GitHub</a>.</p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=d562b2b501c0" width="1" height="1" alt=""><hr><p><a href="https://medium.com/data-design/investigating-xgboost-exact-scalability-d562b2b501c0">Investigating xgboost Exact scalability</a> was originally published in <a href="https://medium.com/data-design">Data Science &amp; Design</a> on Medium, where people are continuing the conversation by highlighting and responding to this story.</p>]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Getting the most of xgboost and LightGBM speed: Compiler, CPU pinning]]></title>
            <link>https://medium.com/data-design/getting-the-most-of-xgboost-and-lightgbm-speed-compiler-cpu-pinning-374c38d82b86?source=rss----fe51a1842648---4</link>
            <guid isPermaLink="false">https://medium.com/p/374c38d82b86</guid>
            <category><![CDATA[benchmark]]></category>
            <dc:creator><![CDATA[Laurae]]></dc:creator>
            <pubDate>Mon, 21 May 2018 10:51:07 GMT</pubDate>
            <atom:updated>2018-01-31T23:19:11.871Z</atom:updated>
            <content:encoded><![CDATA[<figure><img alt="" src="https://cdn-images-1.medium.com/max/1017/1*NILeuqe9aioGwY82nRGzuA.png" /><figcaption>Why should I change my computer setup if it works? To remove 1/3 of your time spending waiting for results!!!</figcaption></figure><p><strong>Currently, xgboost and LightGBM are the two best performing machine learning algorithms for large datasets</strong> (both in speed and metric performance). They <strong>scale very well up to billion of observations and/or elements</strong> (ex: Reputation dataset, <a href="https://sites.google.com/view/lauraepp/new-benchmarks">53,181,000,000 elements</a>).</p><p><strong>xgboost and LightGBM were made primarily for speed</strong>: it is better to <strong>iterate quickly at high accuracy to try more different things</strong>, than waiting your neural network to finish after hours.</p><p>However, although they can be used on large datasets, <strong>the question of scalability was </strong><a href="https://medium.com/data-design/benchmarking-xgboost-5ghz-i7-7700k-vs-20-core-xeon-ivy-bridge-and-kvm-vmware-virtualization-293807a13f1c"><strong>partially answered</strong></a>: how well xgboost and LightGBM are scaling? <strong>Do they prefer high frequency cores or more cores?</strong></p><ul><li><strong>xgboost exact</strong> likes both many cores and high frequency, with a preference on both</li><li><strong>xgboost fast histogram</strong> needs high frequency</li><li><strong>LightGBM</strong> likes both many cores and high frequency, with a preference on high frequency</li></ul><p>As we already know the answer to this question, we are going to look up for a more exotic situation: <strong>changing the compiler, and pinning CPU</strong>.</p><p><strong>Are xgboost and LightGBM faster by swapping the compiler from MinGW to Visual Studio? Is CPU pinning a good thing to do?</strong></p><p>This was also partially answered in <a href="https://github.com/Microsoft/LightGBM/issues/749">this GitHub issue</a>. Therefore, we are back with our Windows machine to do some benchmarks.</p><p><strong>Interactive documents</strong>:</p><ul><li><a href="https://benchmark.laurae.design/speed_r_perf_analysis.html">xgboost and LightGBM raw data</a></li><li><a href="https://benchmark.laurae.design/speed_r_vs_mingw.html">Visual Studio 2017 vs MinGW 4.9</a></li><li><a href="https://benchmark.laurae.design/speed_r_roaming_pinning_cpu.html">CPU Roaming vs CPU Pinning</a></li><li><a href="https://benchmark.laurae.design/speed_r_perf_gpu_analysis.html">GPU xgboost raw data</a></li></ul><p>In the conclusion, an opening to <strong>GPU xgboost</strong> was included.</p><h3>A quick review on the definition of a compiler and CPU pinning</h3><h4>Defining a compiler and CPU pinning</h4><figure><img alt="" src="https://cdn-images-1.medium.com/max/400/0*X41dC85-9aXFv3jK.png" /><figcaption>The place of the compiler for a source code and an executable</figcaption></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/537/1*GadFVd3t3tIzCsZvtmQ3lA.png" /><figcaption>Haswell EP Xeon CPU die configuration: there are four RAM banks which does not have the same latency if you take two different group of cores!</figcaption></figure><ul><li><strong>Compiler</strong>: the compiler <strong>transforms the code of a source language into a code of a target language (usually to generate an executable)</strong>. They are similar to a <strong>translator</strong>, and we all know translators do <strong>not have the same level of performance</strong>: some are providing gibberish words, some are providing excellent translations, which in turns make your <strong>interpretation of words slower or quicker</strong>.</li><li><strong>CPU pinning</strong>: CPU pinning is the <strong>binding of a process (or thread) to a specific range of CPU cores</strong>. This way, <strong>the process will not roam anywhere as easily as it could without CPU pinning</strong>. When the <strong>process roams across CPUs, it incurs significantly higher RAM and cache latency</strong>: this is even more severe with <strong>multi-socket CPUs</strong>.</li></ul><p><strong>CPU pinning is also named CPU affinity</strong>, although the <strong>wording is inexact </strong>(“affinity” could mean “preference”, although it is not in this case: it is <strong><em>“this process uses this range and only this range of CPU cores”</em></strong>).</p><h4>Benchmarking the differences</h4><p>We are going to benchmark the difference between compilers and CPU pinning, for each number of threads available (1 to 56) on our server:</p><ul><li><strong>Two compilers</strong> to test: <strong>Visual Studio</strong> (Windows’ native) and <strong>MinGW</strong> (gcc)</li><li><strong>Two CPU behaviors</strong>: <strong>CPU roaming</strong> (no pinning) and <strong>CPU pinning</strong> (by socket, then by physical core, then by hyperthreaded core).</li></ul><p>The latter means the following: if we have 2 sockets, 4 physical cores on each socket, and hyperthreaded activated, we will try to <strong>contain all CPUs in one socket</strong>, first adding physical (yellow) cores, then adding logical (orange) cores:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/965/1*_Z2mPQvpuna7yMa5tURfbg.png" /><figcaption>Activation order of CPUs: 1, 3, 5, 7, 2, 4, 6, 8, 9, 11, 13, 15, 10, 12, 14, 16</figcaption></figure><p>We are benchmarking xgboost and LightGBM under the following <strong>environment</strong>:</p><ul><li>CPU: Dual Intel Xeon E5–2697v3 (14 cores, 28 threads, 3.6 GHz singlethread, 3.1 GHz multithread)</li><li>RAM: 128GB RAM DDR4 2133 MHz</li><li>GPU: none</li><li>OS: Windows Server 2012 R2 Datacenter, without Meltdown/Spectre patch</li><li>R version: default 3.4.3</li><li>Compiler: Visual Studio 2017, MinGW 4.9 (R)</li><li>xgboost: commit 3f3f54b (Jan 16, 2018, 5:16 PM GMT+1)</li><li>LightGBM: commit 3dc5716 (Jan 18, 2018, 2:16 AM GMT+1)</li></ul><p>The <strong>dataset</strong>:</p><ul><li><a href="https://www.kaggle.com/c/bosch-production-line-performance">Kaggle Bosch training dataset</a></li><li>Number of observations: 1,183,747</li><li>Number of features: 969</li><li>Sparsity: approx 81%</li></ul><p>The <strong>algorithm parameters</strong>:</p><ul><li>Number of boosting iterations: 200</li><li>Learning rate: 0.05</li><li>Maximum depth: 8</li><li>Maximum leaves: 255</li><li>Max bins: 255</li><li>Minimum hessian: 1</li><li>xgboost only: fast histogram, depth-wise</li><li>LightGBM only: minimum split loss of 1 (due to loss-guided optimization)</li></ul><p>Each run were repeated at least twice, up to 10 times. It took approximately 1 week to run the benchmark, thanks to having so many threads!!!</p><h3>Benchmark Results</h3><h4><strong>Reminder: xgboost and LightGBM does not scale linearly at all.</strong></h4><p>xgboost is up to 154% faster than a single thread, while LightGBM is up to 1,116% faster than a single thread.</p><p>If you have a workstation…:</p><ul><li>If you have 56 threads, do not expect that 56 threads to be 5,500% more efficient than 1 thread (it will not train 55x times faster).</li><li>If you have 28 cores, do not expect that 28 threads to be 2,700% more efficient than 1 thread (it will not train 27x times faster).</li><li><strong>If you have a small dataset, do not expect lot of threads to scale well</strong> (it will negatively scale).</li></ul><p><strong>Showing the results taking the best case scenario </strong>(Visual Studio, Roaming CPUs) below:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*IQM7RkVQ4XSAmiYUvLyuAQ.png" /></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*dli-mR5mp1_XQJ6zW-nQZg.png" /></figure><h4>Compiler Performance</h4><p><strong>By far, Visual Studio is the compiler to go on Windows.</strong> It is worth installing <a href="http://landinghub.visualstudio.com/visual-cpp-build-tools">Visual C++ Build Tools</a> to get the fastest training speed possible.</p><p><strong>With roaming CPUs</strong>:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1017/1*dWr11xiooSYIMGQ7uYFTDg.png" /><figcaption>xgboost is very fast using Visual Studio instead of MinGW/gcc</figcaption></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/1017/1*ytorJF9D0Zf5v6ts1X-BFA.png" /><figcaption>LightGBM is a bit faster with Visual Studio instead of MinGW/gcc. Keep in mind, unfortunately, the MinGW slowdown happens at large depth.</figcaption></figure><p><strong>With CPU pinning</strong>:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*f-uG68YPvk_7V78c6KW6aw.png" /><figcaption>xgboost with MinGW depicts huge RAM latencies when spreading the CPU pinning on the physical cores and using 2 sockets at the same time.</figcaption></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/1019/1*u_cgwsTArXwGcKfhauPdSA.png" /><figcaption>LightGBM still likes more Visual Studio over MinGW.</figcaption></figure><h4>CPU pinning Performance</h4><p><strong>CPU pinning increases the performance of xgboost with MinGW significantly. Otherwise, we are seeing performance degradation.</strong></p><p>Story morale:</p><ul><li><strong>Use CPU pinning if you are using xgboost with MinGW.</strong></li><li>Another case: if you are training <strong>parallel xgboost and LightGBM on the same machine, pin the CPUs in order to make sure CPU cache effects can trigger properly </strong>(ex: if you are training 4 xgboost models at the same time on a 4 core machine, pin each model process to a separate core).</li></ul><p>With <strong>Visual Studio</strong>:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1013/1*w5PKmWeLkFf5ORsgNpm-sA.png" /><figcaption>xgboost with Visual Studio requires CPU pinning for performance increases.</figcaption></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/1022/1*86FZGuP7N-OfDIXF6oFdvw.png" /><figcaption>LightGBM seems faster without CPU pinning. Strange?</figcaption></figure><p>With <strong>MinGW</strong>:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1017/1*V-LlMOO7S06Zqvcrqz2u_Q.png" /><figcaption>With MinGW, xgboost does not need CPU pinning IT SEEMS.</figcaption></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*51Xwbz7Z49oKn6QPs-tgGw.png" /><figcaption>LightGBM does not need CPU pinning also IT SEEMS.</figcaption></figure><h3>Conclusion</h3><p><strong>Using Visual Studio without CPU pinning seems the best choice by far.</strong></p><p>The recommendations for the power users wanting the most of their xgboost/LightGBM:</p><ul><li><strong>Use Visual Studio whenever possible</strong></li><li><strong>Train models without CPU pinning</strong></li><li><em>And attempt to get higher CPU frequencies…</em></li></ul><p>If you were <strong>forced to use xgboost in Windows, then force CPU pinning to increase the performance</strong>.</p><p>If you have <strong>single models to train, GPU xgboost</strong> seems the way to go due to how stable it became today. <strong>You do not even need a powerful server, even a laptop’s NVIDIA 1050 Ti outperforms our monster server.</strong></p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*gMD5RJY1cxCO76NAhAGfcw.png" /><figcaption>NVIDIA 1050 Ti + GPU xgboost is FAST!</figcaption></figure><blockquote>For curious, using a NVIDIA 1050 Ti (1.75 GHz) on a laptop with GPU xgboost, it takes 92 seconds to train a model. That’s 28 seconds faster than the fastest xgboost (Visual Studio + CPU pinning + 9 physical cores). An overclocked workstation would slash that time to about 60 seconds.</blockquote><p>Find below the <strong>most brutal comparison in efficiency</strong>, when using xgboost and CPU pinning:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1020/1*ch6qPY84Kn2uK08hBwJeiQ.png" /><figcaption>Which one do you prefer? A tool with 349% efficiency or a tool with 180% efficiency? The answer is very easy!</figcaption></figure><p>Next part: <a href="https://medium.com/@Laurae2/investigating-xgboost-exact-scalability-d562b2b501c0">Investigating xgboost Exact scalability</a></p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=374c38d82b86" width="1" height="1" alt=""><hr><p><a href="https://medium.com/data-design/getting-the-most-of-xgboost-and-lightgbm-speed-compiler-cpu-pinning-374c38d82b86">Getting the most of xgboost and LightGBM speed: Compiler, CPU pinning</a> was originally published in <a href="https://medium.com/data-design">Data Science &amp; Design</a> on Medium, where people are continuing the conversation by highlighting and responding to this story.</p>]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Gigabyte Aero 14 review & benchmarks: laptop versus servers]]></title>
            <link>https://medium.com/data-design/gigabyte-aero-14-review-benchmarks-laptop-versus-servers-ff1458e4028d?source=rss----fe51a1842648---4</link>
            <guid isPermaLink="false">https://medium.com/p/ff1458e4028d</guid>
            <category><![CDATA[benchmark]]></category>
            <dc:creator><![CDATA[Laurae]]></dc:creator>
            <pubDate>Tue, 30 Jan 2018 21:00:54 GMT</pubDate>
            <atom:updated>2017-11-07T20:36:01.122Z</atom:updated>
            <content:encoded><![CDATA[<figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/0*AE6uqysrHzU0ja05.jpg" /></figure><h3>Introduction</h3><p>I recently purchased a <strong>Gigabyte Aero 14K v7</strong> (shortened as 14 in this post) after 6 months of tracking Internet for THE laptop I wanted. I am very picky on laptop specifications and usage, which makes it very difficult to match my needs and what I can get <em>(read more about my needs later in this post)</em>.</p><p>The previous <strong>laptop which met all my needs was the HP Elitebook 840 G1</strong>: IPS 14&quot; Full HD screen, i7, 16GB RAM, two SSDs (SATA + M.2 2242), nearly fully silent… Fully spec-ed out, <strong>it did cost 5 years ago over €4,000 with an international next day 3-year warranty (a 3/3/3 warranty) on-site </strong>(I got it under €700 due to the wrong keyboard in France).</p><h4>Battle of Computers</h4><p><strong>Getting a laptop without comparing how good it is against its competition is futile</strong>. Actually, we will <strong>compare it against insanity</strong> which you can find below:</p><ul><li>A simple ultraportable laptop under the name <strong>Acer Aspire 13 S5–371</strong> (i7–7500U, 8GB RAM, 256GB SSD) with one of the most annoying fans in the world of laptops</li></ul><iframe src="https://cdn.embedly.com/widgets/media.html?src=https%3A%2F%2Fwww.youtube.com%2Fembed%2FSFF7qhemoKU%3Ffeature%3Doembed&amp;url=http%3A%2F%2Fwww.youtube.com%2Fwatch%3Fv%3DSFF7qhemoKU&amp;image=https%3A%2F%2Fi.ytimg.com%2Fvi%2FSFF7qhemoKU%2Fhqdefault.jpg&amp;key=a19fcc184b9711e1b4764040d3dc5c07&amp;type=text%2Fhtml&amp;schema=youtube" width="854" height="480" frameborder="0" scrolling="no"><a href="https://medium.com/media/888a0a7d308f1ee97fde3220d55a8ce3/href">https://medium.com/media/888a0a7d308f1ee97fde3220d55a8ce3/href</a></iframe><ul><li>A <strong>workstation </strong>equipped with a i7–7700, 64GB RAM, 2x 500GB SSD, and a NVIDIA 1080</li><li>A <strong>server with a Dual Quanta Freedom</strong> (Ivy Bridge 2x 10 cores, 2.7 GHz), 128GB RAM, and 2x 500GB SSD</li><li>A <strong>server with a Dual Xeon E5–2697v3</strong> (2x 14 cores, 3.1 GHz), 128GB RAM, and LSI MegaRAID with 4x 500GB SSD in RAID 10</li></ul><p>The latter did <strong>broke into the top 20 world ranking of Cinebench R11.5 on October 12th, 2017</strong>:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/0*4M6ae4viijiW_5dZ.jpg" /><figcaption>Crushing Cinebench R11.5 with a 56 thread monster</figcaption></figure><h4>Specifications</h4><p>You may wonder what are the <strong>specs of my Gigabyte Aero 14 laptop</strong>? Note that it includes <strong>my upgrades</strong> on my Aero 14 (<strong>model name: Aero 14K v7</strong>):</p><ul><li>Screen: <strong>14&quot;, 2560x1440, matte, non touch</strong></li><li>CPU: <strong>i7–7700HQ, undervolted (undervolt -130mV)</strong></li><li>GPU: <strong>Intel HD Graphics, NVIDIA 1050 Ti 4GB RAM (undervolt -150mV)</strong></li><li>Thermal Paste for CPU/GPU: <strong>Thermal Grizzly Kryonaut</strong></li><li>RAM: <strong>2x 16GB RAM 2400 MHz (Crucial CT16G4SFD824A)</strong></li><li>SSD: <strong>Transcend 256GB MTS800 (default), Samsung 960 Evo 1TB</strong></li><li>OS: <strong>Windows 8.1 Pro Update 3</strong></li><li>Factory extras: <strong>USB to Ethernet cable, CPU and GPU -50mV undervolt</strong></li><li>Weight: <strong>1.9kg, about 500g for the charger</strong></li><li>Thickness: <strong>19mm</strong></li></ul><figure><img alt="" src="https://cdn-images-1.medium.com/max/738/0*b3zQkHSotlSUGu_d.jpg" /><figcaption><em>(From notebookcheck) Left side: Kensington lock, HDMI, USB 3.0, audio in/out, SD card</em></figcaption></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/738/0*ijFSJYl4wuHPdiZq.jpg" /><figcaption><em>(From notebookcheck) Right side: USB 3.1 Type-C, Mini DisplayPort, 2 x USB 3.0, power</em></figcaption></figure><h4>Cost &amp; Upgrades</h4><p>The cost was spread approximately the following way:</p><ul><li>Laptop: <strong>€2,000</strong></li><li>RAM upgrade (1x 16GB RAM): <strong>€190 (critical UPS shipping method is a bunch…)</strong></li><li>SSD upgrade (Samsug 960 Evo 1TB): <strong>€450</strong></li><li>Thermal Paste upgrade (Thermal Grizzly Kryonaut): <strong>€5</strong></li><li>Operating System downgrade: €0 (got tons of MSDN licenses)</li><li>Grand total: <strong>€2,645</strong></li></ul><h4>End of Introduction</h4><p><strong>This post will be divided into multiple sections:</strong></p><ul><li>What are my laptop usages?</li><li>Why did I choose this laptop? Because magic?</li><li>Purchasing online is weird?</li><li>Some “synthetic” benchmarks?</li><li>Real world usage?</li></ul><p>Some images were taken from the <strong>notebookcheck review of the Gigabyte Aero 14 (</strong><a href="https://www.notebookcheck.net/Gigabyte-Aero-14-7700HQ-GTX-1060-Laptop-Review.211666.0.html"><strong>NVIDIA GTX 1060</strong></a><strong>, </strong><a href="https://www.notebookcheck.net/Gigabyte-Aero-14K-i7-7700HQ-GTX-1050-Ti-QHD-Laptop-Review.258384.0.html"><strong>NVIDIA GTX 1050 Ti</strong></a><strong>)</strong>. For pictures of the laptop, go check them as it is way better =)</p><h3>How do I use my laptop?</h3><p>This section was added after a question was asked about the usage of my laptop:</p><blockquote>What is this laptop used for? Machine learning? Gaming? General usage? Or all of the above?</blockquote><p>All of the above, <strong>I will describe here more in details the main usages of my laptop</strong>. It may <strong>provide more details about how I am using my laptop and why it was very difficult to find such laptop</strong>.</p><h4>Machine Learning &amp; Data Analysis</h4><p>An important point for me was to be able to use the machine for machine learning, for the following tasks:</p><ul><li><strong>Parallel xgboost</strong> (not xgboost multithreaded): requires a bunch of CPU threads with high frequency, lot of RAM</li><li><strong>Deep learning / neural networks</strong>: GPU is mandatory</li><li><strong>Lot of vertically scaling data analysis</strong>: more threads and more RAM helps a bunch</li><li><strong>OpenCL / CUDA optimized code in R</strong>: need dedicated GPU…</li><li><strong>32-bit data analysis</strong>: we are still in the JMP 10 / SPSS 21 era</li><li><strong>Business Intelligence</strong>: Tableau and Qlik are CPU bound</li></ul><p>Example: for data analyzing <a href="https://www.kaggle.com/c/porto-seguro-safe-driver-prediction">Porto Seguro dataset</a>, if I do not refrain myself from using too much resources, I require on my server (56 CPU threads…) 20 minutes and 110GB RAM to produce meaningful automated reports for human analysis. On my new laptop, the same meaningful analysis takes 4 hours and 30GB RAM (this means you can go to shopping and come back after food with a report ready).</p><h4>Typing Everyday Anywhere / Programming</h4><p>I use my laptop to <strong>type stuff everyday and anywhere when not at work</strong>. This includes emails, chats (Slack, etc.), blogging (Medium), websites, programming…</p><p>When I need to write stuff, I use the following tools:</p><ul><li><strong>RStudio</strong> for R</li><li><strong>Spyder</strong> for Python</li><li><strong>Visual Studio / RStudio</strong> for C++</li><li><strong>Git Bash</strong> for Git and Bash</li><li><strong>Notepad++ and Visual Studio Code</strong> for other languages</li><li><strong>Word / Excel / PowerPoint / Visio</strong> for documents</li><li><strong>KiTTY, MobaXterm, Bitvise SSH Client</strong>… for SSH-ing</li><li><strong>Remote Desktop (mstsc.exe)</strong> for remoting into another machine</li><li><strong>Photoshop / Illustrator / InDesign</strong> for anything graphic related, visually critical on screen</li><li><strong>Axure RP / Mockplus / JustInMind</strong> for anything UI/UX design related, visually critical on screen</li></ul><p>Believe me or not, having <strong>keyboard macro keys</strong> help tremendously in getting very high typing speed. And a <strong>very light laptop</strong> (under 2kg) is a gigantic plus when you can not stay at the same place.</p><h4>Rendering Scenes</h4><p>I use <strong>Daz Studio Pro 4.9</strong> and <strong>KeyShot 7</strong>, and require beefy CPU and GPU depending on my needs.</p><p><strong>CPU helps a lot when processing single elements sequentially </strong>(think: load app, load textures, etc.), while <strong>GPU is the bazooka for final rendering of scenes</strong>. When GPU cannot be used for any reason, <strong>LuxRender allows blazing fast renderings using CPU only</strong>.</p><h4><strong>Virtual Machines</strong></h4><p>I often need virtual machines, literally everyday. I use Hyper-V and VirtualBox for virtualization. This allows me to:</p><ul><li>Have <strong>multiple operating systems</strong> booted at the same time</li><li>Test <strong>distributed programs / machine learning</strong> properly</li><li>Test <strong>web services in an isolated and fully controlled environment</strong></li><li>Test <strong>malware behavior</strong></li><li>Run <strong>Windows 10</strong> when applications are not running without it (eyes rolling at Adobe new software)</li></ul><h3>How did I choose my laptop?</h3><p>For those who know myself personally, they all know I am very picky when it comes to purchasing (actually, investing) into a new laptop. Typically, here are my specs required:</p><ul><li>Screen size: <strong>15&quot; maximum, matte mandatory</strong>, touch screen optional</li></ul><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/0*Digbe_zBvBX9xxMz.jpg" /><figcaption>Good luck being able to read anything when you put this in direct sunlight (Dell XPS 15)</figcaption></figure><ul><li>Screen resolution: <strong>1920x1080 minimum</strong></li></ul><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/0*rrqBq9oqhW2tARk-.jpg" /><figcaption>Comfortable reading (from <a href="https://insider.razerzone.com/index.php?threads/re-razer-blade-2016-fhd-1080p-vs-uhd-4k.21931/">Razer Insider</a>)</figcaption></figure><ul><li>Webcam: <strong>at the top mandatory (Dell is blacklisted due to this)</strong></li></ul><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*aBr08th6jdJRZHMyuVcfNw.png" /><figcaption>Dell XPS 15 webcam placement: LOL</figcaption></figure><ul><li>Operating System: <strong>must be able to install Windows 8.1</strong></li></ul><figure><img alt="" src="https://cdn-images-1.medium.com/max/640/0*76gvvRqV7v45T0wG.png" /><figcaption>Kill all those metro apps starting from Windows 8 (from <a href="https://www.technorms.com/30735/close-apps-windows-8-1">TechNorms</a>)</figcaption></figure><ul><li>CPU: <strong>7th gen (Kaby Lake), hyperthreading available, ultra low voltage (Intel U) or quad core (Intel Q)</strong></li></ul><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/0*7VQGmY52rPre9Zmz.jpg" /><figcaption>Who needs more performance? Dual Xeon E5–2697v3 in action (28 cores / 56 threads, 3.1GHz)</figcaption></figure><ul><li>GPU: dedicated NVIDIA Pascal GPU optional, with power savings (Optimus), Intel Iris Plus/Pro preferred</li></ul><figure><img alt="" src="https://cdn-images-1.medium.com/max/684/1*qz_y_p32bagPQ0JrWEbACA.png" /></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/531/1*mctjSUZw9OaHauxkOT28gw.png" /><figcaption>Holding an NVIDIA Volta GPU and letting it compute 24/7/365 makes your home burn</figcaption></figure><ul><li>RAM: <strong>16GB RAM minimum</strong>, 32GB or 64GB RAM preferred</li></ul><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/0*XbP47-YUXCCigR6w.jpg" /><figcaption>Much RAM coming soon (by <a href="https://twitter.com/incero">Incero</a>)</figcaption></figure><ul><li>Drives: <strong>SSD</strong>, preferred NVMe with 4x PCIe lanes (~4GBps), preferred two SSDs</li></ul><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/0*G_O1YIqWGH2UJXTq.jpg" /><figcaption>SSDNodes SSDs (by <a href="https://twitter.com/incero">Incero</a>)</figcaption></figure><ul><li>Network: <strong>Wi-Fi, non Killer versions (Intel-only network cards)</strong></li></ul><figure><img alt="" src="https://cdn-images-1.medium.com/max/793/1*ihF7VKzEVxFXj9poLYCrSA.png" /></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/788/1*Un_KR3NU1tE0RBOcylJXRg.png" /><figcaption>ATM running Windows XP — Killer Wi-Fi cards are prone to crash under load and requiring an OS reboot</figcaption></figure><ul><li>Mobile internet: a big plus but not mandatory (Huawei / Qualcomm preferred)</li></ul><figure><img alt="" src="https://cdn-images-1.medium.com/max/688/0*cpMU5f-8674F9TmJ.jpg" /></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/380/1*8M928MOrE-PZ4EFo16GZhQ.png" /><figcaption>Slow mobile internet (by TechandGio) vs “fast Internet”</figcaption></figure><ul><li>Ports: <strong>3x USB any version is mandatory, VGA or HDMI is mandatory, mini Display Port is mandatory</strong>, Thunderbolt 3 is a big plus, charger with charging USB port is a big plus</li></ul><figure><img alt="" src="https://cdn-images-1.medium.com/max/458/1*hOQOEegA_4suadBTHJ2SyQ.png" /><figcaption>Need MORE USB ports? Here are 40.</figcaption></figure><ul><li>Keyboard: <strong>backlight mandatory (on all keys + all elements)</strong>, macro keys is a big plus, mechanical feel preferred, centered touchpad preferred, Macbook keyboard style forbidden (no butterfly keys)</li></ul><figure><img alt="" src="https://cdn-images-1.medium.com/max/738/0*s_3oT5Eqj0lL3_qF.jpg" /><figcaption>A keyboard like this (Gigabyte Aero 14)</figcaption></figure><ul><li>Case: <strong>must be able to be opened</strong></li></ul><figure><img alt="" src="https://cdn-images-1.medium.com/max/738/0*J-bOvGYneOwJf6ku.jpg" /><figcaption>Gigabyte Aero 14 opened (NVIDIA 1060 version)</figcaption></figure><ul><li>BIOS: must be able to edit more than what we get in a Surface Pro</li></ul><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/0*4Tdw8ZmR2KIh5Igl.jpg" /><figcaption>Holy moly Surface Pro 1 BIOS is only this!</figcaption></figure><ul><li>Battery life: <strong>at least 8.5 hours of battery life doing web browsing / work in Google Chrome</strong></li></ul><figure><img alt="" src="https://cdn-images-1.medium.com/max/920/0*2P7YUGotE6LSZ7AY.jpg" /><figcaption>Dead battery by <a href="http://www.collegehumor.com/post/7038231/6-stages-of-your-dead-phone-battery">CollegeHumor</a></figcaption></figure><ul><li>Weight: <strong>less than 2kg, less than 2.5kg with charger</strong></li></ul><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/0*a668FArci6lN28ZT.jpg" /><figcaption>Acer Predator 21X by <a href="https://www.youtube.com/watch?v=bnpoAIfUWIk">linustechtips</a></figcaption></figure><ul><li>Fan noise: <strong>next to none in any available silent mode after undervolting, manual throttling, CPU/GPU repasting, etc.</strong></li></ul><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/0*6PEaPZSGLTdoOkeq.jpg" /><figcaption>“FIX THE NOISE” (<a href="https://www.youtube.com/watch?v=mKKGEKh8yWA">PS4 Fan Noise</a>)</figcaption></figure><p>If you read the <a href="https://www.notebookcheck.net/Gigabyte-Aero-14K-i7-7700HQ-GTX-1050-Ti-QHD-Laptop-Review.258384.0.html">notebookcheck review of my laptop</a>, you will find my laptop ticks everything I need as mandatory.</p><h3>Shopping Online and Refunds</h3><p>Before choosing definitely my laptop, I went through many of poor laptop choices which fits nearly all my needs.</p><h4>What did I try?</h4><p>The laptops I tried (non exclusive list) includes:</p><ul><li><strong>Razer Blade Stealth 4K 12.5&quot;</strong>, i7–7500U, 16GB RAM, 512GB SSD, which I refunded because the <strong>coil noise (on both the charger and the laptop) was driving me nuts even after the “BIOS update”</strong> which was supposed to fix it (hint: it just does some software tweaks but will not fix the coil noise)</li></ul><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/0*GkbbiWAfSrcDaS_s.jpg" /><figcaption>Razer Blade Stealth: does it suck? (<a href="https://www.youtube.com/watch?v=d2kH0fqWCn0">YouTube</a>) — yes it does, far from a “Windows Macbook”</figcaption></figure><ul><li><strong>Apple Macbook Pro 13</strong>, 16GB RAM, 512GB SSD, which was delivered with the <strong>wrong keyboard (good luck doing development tasks using a French Apple keyboard!!!)</strong>, with a <strong>small scratch on the back of the screen </strong>(small enough that at an Apple Store they could not even find it without looking carefully)</li></ul><figure><img alt="" src="https://cdn-images-1.medium.com/max/899/0*jyaOgBVHsXyKMTHx.jpg" /><figcaption>Apple Macbook Pro 13? — the keyboard!!!!!</figcaption></figure><ul><li><strong>Dell XPS 13, non Iris Plus version</strong>: screw this laptop as I <strong>returned it before Dell started to refuse returns due to coil noise</strong> (they preferred trying keeping my money and sending me a technician under their Premium Support — <strong>yes, not the ProSupport, </strong><a href="https://www.reddit.com/r/Dell/comments/5szi8g/technician_visited_me_to_fix_coil_whine_xps_13/">it would have ended up like this</a>)</li></ul><figure><img alt="" src="https://cdn-images-1.medium.com/max/965/0*EDGThia_vnEkC0aa.jpg" /><figcaption>Dell XPS 13 does not fit all my needs but it is better than nothing</figcaption></figure><ul><li><strong>Dell XPS 13, Iris Plus version, 8GB RAM</strong>: except the RAM issue (need more…), <strong>it could fit my needs if I could even order it…</strong></li></ul><figure><img alt="" src="https://cdn-images-1.medium.com/max/965/0*EDGThia_vnEkC0aa.jpg" /><figcaption>A stronger Dell but could not even order it?! Only 8GB RAM though.</figcaption></figure><ul><li><strong>HP Spectre x2, Iris Plus, 16GB RAM</strong>: this laptop heats up very quickly and has poor battery life (4h or even less?), not recommended</li></ul><figure><img alt="" src="https://cdn-images-1.medium.com/max/573/0*2lrUHGQegHWDM8xP.png" /><figcaption>HP Spectre x2 is just a toaster.</figcaption></figure><ul><li><strong>HP Spectre x360, 16GB RAM, 15&quot; version</strong>: you just installed a private jet at home</li></ul><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*zdlL7ZGzAh1Q_h7KOqSB7w.png" /><figcaption>HP Spectre x360 (15&quot;, with GPU) is the same as having a private jet at home</figcaption></figure><h4>Dealing with Amazon, Apple, HP, Dell, etc.</h4><p>I am putting only my results when dealing with Amazon, HP, and Dell support. Note that it may apply to France only.</p><p><strong>Amazon France (top tier support/behavior)</strong>:</p><ul><li>Delivery: <strong>same-day delivery</strong> (19h–22h delivery), which is perfect when you work during the day</li><li>Support hours: I could <strong>contact support from 6h to midnight</strong>, which is again perfect when you work during the day</li><li>Support behavior: so far I <strong>did not see any customer support which beats Amazon</strong></li><li>Behavior towards laptops: <strong>laptops must be wiped before sending</strong>, they also <strong>allow to fully wipe the drives before sending back the laptops</strong> (perfect for privacy-minded users)</li><li>Returns: <strong>print a prepaid paper, paste it on the original Amazon package, and send it for free</strong> (approximately 2 weeks to get refunded, which is fast)</li></ul><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/0*rGHKzZbhKUjQvbaT.jpg" /><figcaption>Full format of drives!</figcaption></figure><p><strong>Apple</strong>:</p><ul><li>Delivery: <strong>custom to order (CTO) laptops takes 2 weeks</strong>, but when you will get delivered you get warned of the day and the hour by email/text</li><li>Support hours: <strong>anytime an Apple Store is open is better</strong></li><li>Support behavior: they listen first then they ask (appropriate) questions</li><li>Behavior towards laptops: as I did not use my laptop, no idea whether we should wipe the drives or not</li><li>Returns: <strong>in Apple Stores, takes 2 days ONLY to get the money back</strong>, otherwise same as Amazon</li></ul><figure><img alt="" src="https://cdn-images-1.medium.com/max/694/0*OqFxqXeJ1FldsBIT.png" /><figcaption>Apple Store is best</figcaption></figure><p><strong>HP France</strong>:</p><ul><li>Delivery: 2 days to 1 week, <strong>no control over when you get delivered</strong> (have fun if you work, because you will struggle very hard to get your package)</li><li>Support hours: did not have to deal with them for a refund</li><li>Support behavior: unknown</li><li>Behavior towards laptops: unknown, but I wiped the drives before sending them back (no issue for refund)</li><li>Returns: same as Amazon</li></ul><p>(HP got no physical shops in France?!)</p><p><strong>Dell France</strong>: holy moly when trying to pay using PayPal (if you read French, read the <a href="https://lehollandaisvolant.net/?d=2016/08/13/00/00/59-experience-commander-un-ordi-chez-dell">Le Hollandais Volant</a>):</p><ul><li>You need an ID</li><li>You need a proof of home of where you will be delivered</li><li>They attempt to charge you 20% more than what you were initially charged (you can view this is double VAT for payment authorization)</li><li>They say they tried to call you, when they NEVER attempted it</li></ul><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*MGSa54Un0-uxnHOOat8Hcg.png" /><figcaption>They want your ID card and a proof you live where you will be delivered: good luck trying to send gifts for instance</figcaption></figure><p>The whole block of text:</p><blockquote>Dell — Internal Use — Confidential</blockquote><blockquote>Cher client,<br> <br>Chez Dell nous nous efforçons d’assurer l’intégrité des transactions par cartes bancaires dans le souci de protéger nos clients. Nous vérifions les commandes afin de valider les détails du paiement.<br> <br>Votre commande a été vérifiée, mais malheureusement il nous a été impossible de vous contacter aux numéros de téléphone que vous nous avez fournis lors de votre commande. Par conséquent, nous sommes dans l’obligation d’annuler votre commande . La pièce manquante est un justificatif de domicile à l’adresse de livraison datant de moins de trois mois (dernière facture EDF ou France Télécom ou de Téléphone mobile) + une copie de pièce d’identité du détenteur de la carte de crédit ou kbis . Dès réception des documents demandés, votre commande sera validée et partira en production. Merci de nous envoyer ces documents au plus vite par email à l’adresse suivante :<a href="mailto:SER_CC_Validation@dell.com">SER_CC_Validation@dell.com</a> .Veuillez nous excuser pour le désagrément que nous vous causons, mais soyez rassuré quant à notre attention particulière suite à votre réponse.</blockquote><p><strong>I would just put Dell in a blacklist and try to get their laptops through Amazon FR or Amazon DE.</strong></p><h3>Synthetic Benchmarks</h3><p>Here, we will take our machines and make them fight against each other in benchmarks. We are taking useful (comparable) fighting cases for our machines.</p><h4>What are we testing?</h4><p>We are going to use three benchmarks:</p><ul><li><strong>Cinebench R11.5 and R15, on CPU</strong>: get the magnitude of difference between a laptop and a powerful server</li><li><strong>Cinebench R11.5 and R15, on GPU</strong>: how powerful is our NVIDIA GTX 1050 Ti against Intel HD Graphics?</li><li><strong>AS SSD</strong>: how crazy can be the Samsung 960 Evo 1TB?</li></ul><h4>What are we playing against?</h4><figure><img alt="" src="https://cdn-images-1.medium.com/max/660/1*Df2gUP-_wMLIxBCdyWLDiw.png" /><figcaption>CPU modern warfare: 28c/56t monster versus small machines</figcaption></figure><p>My Gigabyte Aero 14 is going to be tested against several machines, in the following increasing order of performance:</p><ul><li><strong>Acer Aspire 13 S5–371</strong>: i7–7500U (2c/4t, 3.5/3.5GHz), Intel HD Graphics 620</li><li><strong>Gigabyte Aero 14, near-silent wattage (33dB)</strong>: i7–7700HQ (4c/8t, 2.3/3.6GHz), NVIDIA GTX 1050 Ti 4GB</li><li><strong>Gigabyte Aero 14, “Gaming” fans mode (37dB)</strong>: i7–7700HQ (4c/8t, 3.5/3.9GHz), NVIDIA GTX 1050 Ti 4GB</li><li><strong>Workstation</strong>: i7–7700 (4c/8t, 4.0/4.2GHz)</li><li><strong>Server 1</strong>: Dual Quanta Freedom Ivy Bridge (2x 10c/20t, 2.7/3.3GHz)</li><li><strong>Server 2</strong>: Dual Xeon E5–2697v3 (2x 14c/28t, 3.1(2.9)/3.5GHz)</li></ul><p>The server 2 cannot sustain 3.1GHz as it exceeds its power limits (145W), it throttles down to 2.9GHz (which is still higher than its base clock).</p><h4>Extra additions</h4><figure><img alt="" src="https://cdn-images-1.medium.com/max/352/1*NEwrjxhe3AUYYMRSfeBKFg.png" /><figcaption>I applied stronger undervolt but it did not improve anything (it actually consumed more watts at the wall!) — lowering undervolt by 10mV was beneficial on CPU and the integrated GPU (not the dedicated GPU)</figcaption></figure><p>The following was applied on our Gigabyte Aero 14 past our benchmarks, and it did not change any results (other than making the laptop less loud):</p><ul><li>Thermal Paste: <strong>Thermal Kryonaut Grizzly</strong></li><li>CPU undervolt: <strong>-130mV</strong></li><li>GPU undervolt: <strong>-150mV</strong></li></ul><p>We also did the following on our Acer Aspire 13 S5–371:</p><ul><li>CPU undervolt: <strong>-90mV</strong></li><li>GPU undervolt: <strong>-90mV</strong></li><li>Turbo Boost Power Max: <strong>25W</strong></li></ul><h4>Cinebench R15</h4><p>As expected, <strong>our Aero 14 is getting crushed by our workstation and servers</strong>, but it beats very easily old Intel CPUs (in singlethread) and our ultra low voltage laptop (being twice as slow for multithreaded tasks).</p><p><strong>If CPU performance matters, don’t buy a laptop: purchase or rent a server.</strong> A desktop with a simple i7–7700K will not be up for your task, as you can get Intel i7–7820HK overclockable CPU in a mobile laptop under 3kg (check for Clevo chassis laptops).</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/480/1*XJ0jAe52H4w276zdEjg6rw.png" /></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/480/1*7gqZnob6-It7lcBLos8JCw.png" /></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/480/1*7ePPoGdqmituAWkYRjCsOg.png" /></figure><p>Screenshot of results:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*oA90N9csJLDEsFOOJNxHsg.png" /></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*C6uxcM4iSRhgv-GC4ZOjVA.png" /></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*49Bttp_FjfOq5Z1UIAEbTA.png" /></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*Qgtu3-6j1vM8FUtn9ihhRg.png" /></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*qJUTDWBLned3_Q9Ji99Ocw.png" /></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*YzeBOk9X8Kjkg728yYDVtw.png" /><figcaption>Cinebench R15: Acer Aspire (i7–7500U), Workstation (i7–7700), 20 core Quanta Freedom, 28 core Xeon E5–2697v3, Aero 14, Aero 14 Throttled</figcaption></figure><h4>Cinebench R11.5</h4><p>The Gigabyte Aero 14 is doing fairly well as on Cinebench R15 despite being only a quad core mobile CPU. When it comes to GPU, the NVIDIA GTX 1050 Ti just crushes our Intel HD Graphics 620.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/480/1*wpLp1mk-e1m1MO4Q_-BaVw.png" /></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/480/1*Kl0aaf4qYsG4nOtMd_Uk_Q.png" /></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/480/1*Bsgo7negfh--FHoPLX97Og.png" /></figure><p>Screenshot of results:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*ONsU7m4WdOLgmMs8wRbw9w.png" /></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*5p3DJ6S1FLcSevT-zbvk1A.png" /></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*fdl-ELbIEavtZBHObhM5rw.png" /></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*vaI79fw-qOaKCCVkWv3EFA.png" /></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*6bFIVnQvEkrGdJYiLxPGYQ.png" /></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*PT8q3UqLo_uDbOQ0gnLkWA.png" /><figcaption>Cinebench R11.5: Acer Aspire (i7–7500U), Workstation (i7–7700), 20 core Quanta Freedom, 28 core Xeon E5–2697v3, Aero 14, Aero 14 Throttled</figcaption></figure><h4>AS SSD Scores</h4><p><strong>The Samsung 960 Evo makes every non NVMe SSD look idiot. Cheap prices, high capacity, what are you expecting for €450?</strong> (a Samsung 960 Evo 1TB, supposing we ignore the Samsung 960 Pro 1TB at €600)</p><p>However, when the price of the laptop is €2,000, finding such Transcend MTS800 SSD (which is also only 256GB) is unacceptable. I’ll contact Gigabyte to know what is their point of view towards this, as we all know we wants those Samsung PM961 inside and not that poor Transcend SSD which cannot even go faster than SATA SSD speeds.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/480/1*WzLKjoJRUt7LJcRYuQTyyg.png" /></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/480/1*ytM98NegodikbYCsv3Sotg.png" /></figure><h4>AS SSD General Results (MBps)</h4><p>Just so you can check how the Samsung 960 Evo crushes the competition, and how bad the factory Transcend MT800 is.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/480/1*_zEBzQxQmdm3KUd8uloO7A.png" /></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/481/1*SfdHIkWsFFptv_0TmTx2PA.png" /></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/481/1*IZ7zlrIPiyJxBlx8Ip3NGQ.png" /></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/481/1*65Eykcnv_RB3Odoa7BIEeA.png" /></figure><p>Screenshot of results:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/513/1*50iQH2AF8spK0pvbuF5LJQ.png" /></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/513/1*RsCOjKwcLpiaIeKKn5GiDQ.png" /></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/513/1*beK2nCXCmBFrTaAgNKdYMw.png" /></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/513/1*FtKq8TWEkDwF9KvMbRcUAw.png" /></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/503/1*0uHgRrhMaq_DvbEmAGznkQ.png" /></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/503/1*s0LiCpwik5TKgfFwaFhJow.png" /><figcaption>AS SSD Benchmark: Acer Aspire, Workstation, 20 core Quanta Freedom, 28 core Xeon E5–2697v3, Aero 14 (Samsung 960 Evo), Aero 14 (Transcend)</figcaption></figure><h4>AS SSD General Results (IOPS)</h4><p>Just so you can check how the Samsung 960 Evo crushes the competition, and how bad the factory Transcend MT800 is.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/480/1*k4bLJxpGyoDB4susBsFsSQ.png" /></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/481/1*qSk-GD0wcTbtoXRhW3XxLg.png" /></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/481/1*n7Kyb5Nqz7V3ZYUXYe9gHw.png" /></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/481/1*s1R2GL8wwmpClwcAFHxLjw.png" /></figure><p>Screenshot of results:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/513/1*gt_hQXerGOMDr1JUqjeqnQ.png" /></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/513/1*_rZq_eQy49W9j_pEqqeQJQ.png" /></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/513/1*-jEKfUpxKwvANVen8A7LdQ.png" /></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/513/1*FK-8JMzclaohguSEnm792w.png" /></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/503/1*TocAZV-KOFpgCQUlWDBO1g.png" /></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/503/1*hoMtbN35frmm4jCPHacIhQ.png" /><figcaption>AS-SSD Benchmark: Acer Aspire, Workstation, 20 core Quanta Freedom, 28 core Xeon E5–2697v3, Aero 14 (Samsung 960 Evo), Aero 14 (Transcend)</figcaption></figure><h4>AS SSD Copy Results</h4><p>The Transcend MTS800 is getting crushed by a Crucial MX300.</p><p>Screenshot of results:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/513/1*4Th89meAb13zKNrfB_d2rw.png" /></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/513/1*36BB9Hfu8ZnjdpizxjOy8w.png" /></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/513/1*6My913mEuMIi56-Fw4lHIA.png" /></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/513/1*gsy604PQJK-vH85lHRTAKg.png" /></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/503/1*VQG512TDer1QzAOw0iT7Og.png" /></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/503/1*xkVPsE590AF77gA901Xy3w.png" /><figcaption>AS-SSD Benchmark: Acer Aspire, Workstation, 20 core Quanta Freedom, 28 core Xeon E5–2697v3, Aero 14 (Samsung 960 Evo), Aero 14 (Transcend)</figcaption></figure><h4>AS SSD Compression Results</h4><p><strong>How fast is the Samsung 960 Evo compressing data? Too fast!</strong></p><p>Screenshot of results:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/807/1*OCV_JlzZ2_4CxXnnVTp5GA.png" /></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/807/1*zbIYiF-4V7Onxb1Bi7vwWw.png" /></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/807/1*WXw-Z3h2vGV3Sn2Hpi6TqA.png" /></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/807/1*KEBwn0eKfW7IXqceuP87qQ.png" /></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/797/1*KBiEUxe0mWciDUZgVCH2SQ.png" /></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/797/1*dA2cW6odeWAmQRS7HqTeOQ.png" /><figcaption>AS-SSD Benchmark: Acer Aspire, Workstation, 20 core Quanta Freedom, 28 core Xeon E5–2697v3, Aero 14 (Samsung 960 Evo), Aero 14 (Transcend)</figcaption></figure><h3>Real world usage of the laptop</h3><p>I got my Gigabyte Aero 14 since last week, and so far I am very happy when it comes to the performance, noise, keyboard, screen, and battery. I got a major issue with drivers and random crashes due to (old) NVIDIA drivers.</p><p>When opening the laptop, two DVD are provided in addition to the laptop and the USB / Ethernet cable:</p><ul><li>7GB DVD for drivers</li><li>Cyberlink PowerDVD 12</li></ul><h4>The major issue: drivers</h4><ul><li>First of all, this major driver issue could be even worse if Gigabyte did not provide a 7GB DVD with all drivers and the 17GB backup “GIGAWIN10RC” USB (you need to make the USB yourself, but you get prompted right at the beginning to do it).</li></ul><figure><img alt="" src="https://cdn-images-1.medium.com/max/760/1*ltjGV96g1ccB3u5hSOQMVw.png" /><figcaption>There issome Windows 8.1 stuff inside?!</figcaption></figure><ul><li>Second, I am using the laptop in an unsupported scenario: Windows 8.1 Update 3 (however, <strong>they do have </strong><a href="http://www.gigabyte.us/Laptop/AERO-14--GTX-1050-Ti#support-dl"><strong>drivers for Windows 7</strong></a><strong> even when using “unsupported” Kaby Lake CPUs which is very rare!!!</strong>).</li></ul><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*SWhXTs-ectbGUv9hA28Akw.png" /><figcaption>Broken hopes for finding a Windows 8.1 image =(</figcaption></figure><ul><li>Third, all installable software is very easy to install.</li></ul><figure><img alt="" src="https://cdn-images-1.medium.com/max/800/1*Sink9JXFkWGi_BLNrCvatg.png" /></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/780/1*6vMNr5DZyU2gAHhMnnsulA.png" /><figcaption>Just click everywhere to install (for instance, it says I do not have Thunderbolt drivers installed at that time, for Bluetooth drivers it is bugged)</figcaption></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/800/1*f-wJU_xqdVry0LWor5sv3A.png" /><figcaption>Bundled software — did not install Intel RST Premium as I use AHCI and Samsung 960 Evo drivers</figcaption></figure><p>When it comes install GPU drivers, this is another story:</p><ul><li>Intel HD Graphics drivers cannot be installed on Windows 8.1 without some small hacks</li><li>NVIDIA drivers cannot be installed without installing Intel HD Graphics drivers before</li></ul><p><strong>Solution: do some Intel HD Graphics driver hacking (there are guides online), install the drivers, and install NVIDIA drivers afterwards.</strong> This solution alone would put off anyone who does not want to get into driver hacking.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*q0FKAHuDGVBpoDWtCIpTsg.png" /><figcaption>oh my gawd: Windows 8.1 + Kaby Lake!!!</figcaption></figure><h4>Performance and Noise</h4><p>There is nothing wrong with the performance of this laptop, except some weird and buggy behavior of the laptop graphic cards:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/974/1*eOoX-iGQGikzWxp44OEDCA.png" /><figcaption>Let me use PhysX using my NVIDIA GTX 1050 Ti please…</figcaption></figure><p>When it comes to the noise, the only issue is when putting the fans into Gaming Mode. I tend to use Quiet Mode instead, although it throttles the CPU quite a lot for a near silent operation (33dB at full load after undervolting):</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/600/1*YVkxy4VYwdQXYS5y1Gbw5Q.png" /></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/449/1*eBKfya83S7QA-xdF_1CdtQ.png" /><figcaption>Gigabyte Smart Manager — Clicking on help brings a nice PDF which explains everything you can do in the Smart Manager</figcaption></figure><p>The Gigabyte Smart Manager allows to control everything on the laptop, except the Bluetooth button which is broken in Windows 8.1.</p><p>The controls are the following:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/601/1*kdyO_pNsm59KkJy9CpOXwQ.png" /></figure><ul><li>Change volume (nothing exceptional)</li><li>Mute sounds (nothing exceptional)</li></ul><figure><img alt="" src="https://cdn-images-1.medium.com/max/600/1*qqcFmmxnIylgMsnmjLvqRw.png" /></figure><ul><li>Change brightness / Automatic brightness (the latter can only be done here)</li></ul><figure><img alt="" src="https://cdn-images-1.medium.com/max/600/1*3oW0VMLq1CdoBunUYPPTgQ.png" /></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/600/1*A7oHbocEAKA3zup9GuTMiQ.png" /></figure><ul><li>Power Mode (same as Control Panel &gt; Hardware &amp; Sound &gt; Power Options)</li><li>Wi-Fi on/off (nothing exceptional)</li><li>Bluetooth (is broken)</li><li>Camera (nothing exceptional)</li></ul><figure><img alt="" src="https://cdn-images-1.medium.com/max/600/1*SLK47JB_6-4h_ucqu0eNyA.png" /></figure><ul><li>Keyboard Backlight (can also do Fn+Space)</li><li>Monitor Switch (opens the charms)</li><li>Mouse Speed (faster than going into Control Panel)</li><li><strong>Windows Key Lock</strong> (holy moly the amount of time you might accidentally press the Windows button)</li><li>Font Setting (DPI change…)</li></ul><figure><img alt="" src="https://cdn-images-1.medium.com/max/600/1*VgONwOli4jHKxizqJJh9CQ.png" /></figure><ul><li><strong>X-Rite Pantone</strong> (calibrated display, I measured less than 2 of difference)</li><li><strong>White Color / Blue-light Killer</strong> (nice to have)</li></ul><figure><img alt="" src="https://cdn-images-1.medium.com/max/600/1*PPWb2fdmC1diRKrqlm9RXQ.png" /></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/600/1*nQz3wZnXtPPgEtHAZNvBAQ.png" /></figure><ul><li><strong>Fan Tweaks</strong> (this one is vital as it allows you to control the fans directly, and their modes: Quiet (silent or near fully silent), Normal (not silent but not loud), Gaming (not silent to loud), Custom Auto (not silent, maximum noise allowed), Custom Fixed (permanent noise))</li><li><strong>Smart Dashboard</strong></li></ul><figure><img alt="" src="https://cdn-images-1.medium.com/max/960/1*XaFc0qDHvmpjO8WU_OBo7g.png" /><figcaption>Gigabyte Smart Dashboard (yes, 0 RPM fans since I’m writing this post)</figcaption></figure><p>As for the fans, here are the settings:</p><ul><li><strong>Quiet fan: 0 RPM (until you go over 60°C CPU/GPU, CPU throttles, no GPU throttle — maximum fan seems 30% on CPU, 40% on GPU)</strong></li><li>Normal / Gaming fan base: 2167 RPM (same as 30% auto fan noise)</li><li>30% auto fan (ex-25%): 2204 RPM (32dB, you will not even notice it from far)</li><li>30% fixed fan: 2551 RPM (33dB, a bit sleepy)</li><li>40% fixed fan: 3208 RPM (35dB, maximum you will encounter in quiet mode?)</li><li>50% fixed fan: 3660 RPM (37dB, your occasional peak)</li><li>70% fixed fan: 4647 RPM (you can feel the wind from your seat, 42dB)</li><li>100% fixed fan: 5615 RPM (oh my god you have a private jet at home, 50dB)</li></ul><p>The laptop is full aluminium with a bit of plastic (a lot of plastic for the screen and the ports cough cough), and gets warm quickly if you stress the CPU and GPU a lot. In addition, <strong>the fans blows up on the monitor, which might be unusual if you are not used to it</strong>.</p><p>When opening the case, due to how tight is everything, you may think you are exploding the laptop with the cracking noises.</p><h4>Laptop Life</h4><p>This laptop is a gem as it combines the following:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/740/1*sAyy2cXfh-jgjLs0lHwEsQ.png" /><figcaption>Reported battery life of the Gigabyte Aero 14 on notebookcheck</figcaption></figure><ul><li><strong>Not so bad GPU</strong> (NVIDIA 1050 Ti)</li><li><strong>Small weight / high portability</strong>: 14&quot;, less than 2kg (1.9kg)</li><li><strong>High performance mobile CPU</strong>: i7–7700HQ</li><li><strong>Long battery life</strong>: I usually hit 10h to 11h battery life</li><li><strong>High quality calibrated screen</strong></li><li><strong>Two drives with 4x PCIe</strong></li><li><strong>Mechanical keyboard and Macro keys</strong></li></ul><p>As for the keyboard, <strong>it feels close to a mechanical keyboard and will hurt anyone who uses only membrane keyboards</strong>.</p><p>I do not recommend this laptop who are not used to typing on mechanical keyboards, because they will get tired very quickly. However, if they keep trying on this laptop, they will get rewarded with the <strong>Macro keys which allows you to perform an action / series of actions automatically on a single key press</strong>.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/544/1*YKPpzFcne5NyjdhHVFhskA.png" /></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/544/1*kj9ppQ2cAlVtuO720mBzsw.png" /><figcaption>Macro keys, when empty. You can record up to 88 macros (out of 100).</figcaption></figure><p><strong>We can also switch the macro used by pressing the “G” button, effectively changing the column of the macros used</strong> (there are 5 columns with their respective colors, visible on the keyboard).</p><p>Note that when holding the laptop, it feels common to cut the skin of the hands due to how sharp the edges of the case can be. <strong>The solution is to stop trying to be MacGyver and to learn to pick and hold the laptop properly.</strong></p><p>For the fans, as long as you use the Quiet mode, you will very rarely encounter the fan noise. <strong>Even watching YouTube videos does not trigger the fans.</strong></p><h3>Conclusion</h3><p><strong>This laptop is very expensive and the luxury of the users who need to make the most out of their laptop.</strong></p><p>Do not use purchase this laptop if you are lurking to do a single thing, because this laptop fits the following type of user:</p><iframe src="https://cdn.embedly.com/widgets/media.html?src=https%3A%2F%2Fwww.youtube.com%2Fembed%2F_cbPjxOCNVE%3Ffeature%3Doembed&amp;url=http%3A%2F%2Fwww.youtube.com%2Fwatch%3Fv%3D_cbPjxOCNVE&amp;image=https%3A%2F%2Fi.ytimg.com%2Fvi%2F_cbPjxOCNVE%2Fhqdefault.jpg&amp;key=a19fcc184b9711e1b4764040d3dc5c07&amp;type=text%2Fhtml&amp;schema=youtube" width="854" height="480" frameborder="0" scrolling="no"><a href="https://medium.com/media/7e440bfffed6106651bef3ff7394de99/href">https://medium.com/media/7e440bfffed6106651bef3ff7394de99/href</a></iframe><blockquote>tl;dr: one size fits all, jack of all trades</blockquote><p>Checklist:</p><ul><li><strong>Portable laptop</strong> (light, small screen, noiseless)</li><li><strong>High quality matte screen</strong> (QHD IPS, 2560x1440, well placed webcam)</li><li><strong>High battery life</strong> (8h+)</li><li><strong>Many ports</strong> (USB 3, HDMI, mini Display Port, Thunderbolt 3, SD card)</li><li><strong>Acceptable performance both on CPU</strong> (4 cores) <strong>and GPU</strong> (not Intel)</li><li><strong>Self-serviceable / upgradable</strong> (2x DDR4–2400 RAM, 2x NVMe M.2 2280 storage)</li></ul><p>If you do not need one of those, you can get an equivalent of that “Gigabyte Aero 14” for half the price (don’t say Dell XPS 15 is the answer, its noise is insanity).</p><blockquote>What else would you need?</blockquote><p>Oh wait. <strong>If you click the touchpad while the laptop is off, you can get an idea about how much battery left you have. Perfect for travelers.</strong></p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=ff1458e4028d" width="1" height="1" alt=""><hr><p><a href="https://medium.com/data-design/gigabyte-aero-14-review-benchmarks-laptop-versus-servers-ff1458e4028d">Gigabyte Aero 14 review &amp; benchmarks: laptop versus servers</a> was originally published in <a href="https://medium.com/data-design">Data Science &amp; Design</a> on Medium, where people are continuing the conversation by highlighting and responding to this story.</p>]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Is KVM virtualization slowing down CPU computations?]]></title>
            <link>https://medium.com/data-design/is-kvm-virtualization-slowing-down-cpu-computations-5a29697a9511?source=rss----fe51a1842648---4</link>
            <guid isPermaLink="false">https://medium.com/p/5a29697a9511</guid>
            <category><![CDATA[benchmark]]></category>
            <dc:creator><![CDATA[Laurae]]></dc:creator>
            <pubDate>Thu, 31 Aug 2017 10:12:06 GMT</pubDate>
            <atom:updated>2017-08-31T10:13:30.792Z</atom:updated>
            <content:encoded><![CDATA[<p>When it comes to pure CPU computations, <a href="https://medium.com/data-design/benchmarking-xgboost-5ghz-i7-7700k-vs-20-core-xeon-ivy-bridge-and-kvm-vmware-virtualization-293807a13f1c"><strong>KVM is doing a great job</strong></a> at providing maximum performance <strong>when tuned properly</strong>. And when it comes to <strong>raw CPU performance</strong>, Cinebench R15 is just one of the best benchmarking tools: CPU bounded, slightly RAM dependent.</p><p><strong>Not tuning properly CPU pinning and NUMA nodes may lower the Cinebench R15 scores by about 10 to 20%</strong> (Geekbench 4 may also get… 70% lower scores, when it comes to use hugepages).</p><p>We will take <strong>our 20 core machine as the baseline</strong>:</p><figure><img alt="" src="https://cdn-images-1.medium.com/proxy/1*F5bv1H3lM5dCghl7yCAZnA.png" /><figcaption>CPU topology of our 20 core server</figcaption></figure><ul><li>Quanta Freedom Ivy Bridge (2x 10c/20t, 3.1/2.7GHz) with aircooling</li><li>96GB RAM</li><li>2x 525GB SSDs</li><li>Host machine: Ubuntu 16.04 with stock kernel</li><li>Virtual machine: Windows Server 2012 R2 Datacenter, using KVM, with replicated host topology (2 sockets, 10 cores, 2 threads)</li><li>Baremetal machine: Windows Server 2008 R2 SP1 Datacenter, without Hyper-V role</li></ul><p><strong>Cosmetic differences may be brutal</strong> for those who did not use Windows XP or Windows 7 for a while:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/967/1*UMqA3aW7IGHeNl6Vwswv2A.png" /><figcaption>Linux htop, pretty clear</figcaption></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/692/1*xeg53TnH3eJlW8QqskM8uA.png" /></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/754/1*uRilqV8StjeWgzmp8WbF3w.png" /><figcaption>Windows XP (Windows 7 without Aero), and Windows 7 Task Managers</figcaption></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/658/1*e02sXpPDLlShEjLYTyPLDQ.png" /><figcaption>Windows 8 Task Manager</figcaption></figure><p>You may try to find out (without looking the operating system…) <strong>which one is the virtual machine below</strong>:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*Rp-ZqsN2WWYWLiJPrvZnbw.png" /></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*gwWR7QPtoTcoXoFLHxTVPA.png" /></figure><p><strong>The difference is unnoticeable</strong>, and the runs shown here are the median runs (11 runs, 6th best run). The benchmark ranged approximately from 2270 to 2310 on both machines.</p><p><strong>Conclusion: KVM does not affect CPU performance of your machine, on CPU bounded tasks, only if your virtualization setup is correct.</strong></p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=5a29697a9511" width="1" height="1" alt=""><hr><p><a href="https://medium.com/data-design/is-kvm-virtualization-slowing-down-cpu-computations-5a29697a9511">Is KVM virtualization slowing down CPU computations?</a> was originally published in <a href="https://medium.com/data-design">Data Science &amp; Design</a> on Medium, where people are continuing the conversation by highlighting and responding to this story.</p>]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[LightGBM on Windows: Visual Studio vs MinGW (gcc), R with Visual Studio]]></title>
            <link>https://medium.com/data-design/lightgbm-on-windows-visual-studio-vs-mingw-gcc-r-with-visual-studio-417fc14eca2c?source=rss----fe51a1842648---4</link>
            <guid isPermaLink="false">https://medium.com/p/417fc14eca2c</guid>
            <category><![CDATA[benchmark]]></category>
            <dc:creator><![CDATA[Laurae]]></dc:creator>
            <pubDate>Sat, 10 Jun 2017 12:21:16 GMT</pubDate>
            <atom:updated>2017-06-10T12:29:07.303Z</atom:updated>
            <content:encoded><![CDATA[<p>Thinking on using LightGBM on Windows? You know you are given two hard choices: Visual Studio or MinGW (gcc).</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/377/1*YMRnp3wXIBM8Uk7SS9fuFg.png" /><figcaption>Visual Studio 2017 is alone a whooping 2GB, excluding external dependencies.</figcaption></figure><p>But everyone knows <strong>Visual Studio</strong> is a <strong>pain to install</strong>. Even the <a href="https://www.visualstudio.com/downloads/#build-tools-for-visual-studio-2017">Microsoft Build toolset</a> does not alleviate the pain of having a <strong>large download</strong> to do before even being able to compile something.</p><p>Even though the installation is about <strong>2GB for Visual Studio 2017</strong> (because you may want the GUI to test R/Python integration after all), it is <strong>significantly better than the previous 8GB</strong> for Visual Studio 2015!</p><p>Meanwhile, with <a href="https://sourceforge.net/projects/mingw-w64/files/">MinGW</a> (x86_64-posix-seh, aka 64-bit + posix threads + seh debug), a <strong>simple 50MB file to download</strong> and extract eases the life!</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/377/1*bbc4u1qWs-tDoyRd625WRw.png" /><figcaption>MinGW x86_64-posix-seh is big? Think again.</figcaption></figure><p>But are you losing something when using MinGW and going the “easy way”? This is what we are going to check (quickly)…</p><h3>What is sparking the need to check for Visual Studio vs MinGW?</h3><p>I think you will understand visually, there is no need to explain.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/659/0*3aJAwTAKnW7xSm_Z.png" /></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/0*aIKTe7QK9B2IyCLk.png" /><figcaption>MinGW/gcc (left) vs Visual Studio (right): CPU usage different under the same settings, but with only a difference: the compiler?</figcaption></figure><p>It becomes obvious from this comparison picture that we have a <strong>major issue with MinGW/gcc</strong>: the <strong>CPUs are not busy enough</strong> on large datasets, while <strong>Visual Studio maintains all cores busy</strong>!</p><h3>Some benchmark comparisons of Visual Studio and MinGW</h3><p>You can find all detailed benchmarks on the following links:</p><ul><li>GitHub: <a href="https://github.com/Microsoft/LightGBM/issues/542">Microsoft/LightGBM#542</a> (Visual Studio reports higher CPU usage than MinGW)</li><li>GitHub: <a href="https://github.com/Laurae2/gbt_benchmarks/issues/1">Laurae2/gbt_benchmarks#1</a> (some questions)</li></ul><h4>Laptop benchmark (2 physical cores)</h4><p>My main laptop has a <strong>i7–4600U CPU with 16GB RAM</strong>. We can check very quickly its performance on <strong>Bosch dataset</strong> (1M observation and 1K features dataset), which fits nicely in our RAM.</p><p>We are <strong>testing LightGBM</strong> under the following scenarii:</p><ul><li><strong>Visual Studio 2017 on CLI</strong> (master)</li><li><strong>MinGW 7.1 on R</strong> (master and v2.0)</li><li><strong>MinGW 7.1 on CLI</strong> (master and v2.0)</li></ul><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*iG-c2CNKrldIImqiOdJDsA.png" /></figure><p>Unexpectedly, <strong>Visual Studio is slower than MinGW</strong>. For <strong>small number of threads</strong>, it seems <strong>MinGW is better</strong> (even with R callback and processing overhead) than Visual Studio.</p><p>When comparing CLIs (Visual Studio and MinGW), the <strong>difference is a well-sized 5%</strong>.</p><p><strong>R overhead</strong> is approximately <strong>3% of the computation time</strong>.</p><h4>Server benchmark (20 physical cores)</h4><p>My main server has a <strong>Dual Xeon Ivy Bridge (Quanta Freedom) with 80GB RAM</strong> allocated to a <strong>virtual machine</strong>. Performance checkup is done again on Bosch dataset.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*4qSyLev_KOBLPeHNApShiw.png" /></figure><p>We notice quickly <strong>the more threads we throw, the more performance we have</strong>. The difference is so heavy that it reaches:</p><ul><li><strong>15% worse for not using hyperthreaded cores</strong></li><li><strong>Up to 40% worse for using MinGW and not hyperthreaded cores</strong> instead of Visual Studio with hyperthreaded cores</li></ul><p>It is <strong>obvious who is the winner here</strong>: Visual Studio.</p><h4>Versus xgboost?</h4><p>Just for eyes obviously, using my <strong>laptop </strong>with 2 physical cores (4 threads):</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/606/1*HEi7nG4jESLb7yuvnBNflw.png" /></figure><p><strong>xgboost (fast histogram) has bridged the performance gap with LightGBM.</strong> They are only 5% apart in this case.</p><h3>Conclusion</h3><p>A quick conclusion could be the following:</p><blockquote>Windows users should use <strong>MinGW for LightGBM when they are using low-end machines</strong>, such as <strong>laptops with 2 cores only</strong>. When reaching <strong>more cores</strong> (<a href="https://github.com/Microsoft/LightGBM/pull/598">like 4 physical cores</a>), it is recommended to use <strong>Visual Studio to reach maximum performance</strong>.</blockquote><p>This is the reason the <strong>pull request</strong> <a href="https://github.com/Microsoft/LightGBM/pull/584"><em>“Compile R package by custom tool chain”</em></a> is existing: if you have a <strong>high performance tool</strong>, then make sure you are using that <strong>high performance at its fullest</strong>! It means in our case: <strong>compile with Visual Studio, but use in R</strong>.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/992/1*yX_yMwxJSJdkHP0yf9Ulzg.png" /><figcaption>“I have no idea what I’m doing” meme</figcaption></figure><p>Apparently, it also <strong>eases the installation</strong>, especially for Mac OS users.</p><p><strong>If you do not know what are you doing, use Visual Studio.</strong></p><p>This is as simple as doing a simple math addition: <strong>setup your PATH environment variable correctly</strong>!</p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=417fc14eca2c" width="1" height="1" alt=""><hr><p><a href="https://medium.com/data-design/lightgbm-on-windows-visual-studio-vs-mingw-gcc-r-with-visual-studio-417fc14eca2c">LightGBM on Windows: Visual Studio vs MinGW (gcc), R with Visual Studio</a> was originally published in <a href="https://medium.com/data-design">Data Science &amp; Design</a> on Medium, where people are continuing the conversation by highlighting and responding to this story.</p>]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Interview: AMD Ryzen as a workstation]]></title>
            <link>https://medium.com/data-design/interview-amd-ryzen-as-a-workstation-4d409eec25e2?source=rss----fe51a1842648---4</link>
            <guid isPermaLink="false">https://medium.com/p/4d409eec25e2</guid>
            <category><![CDATA[fun]]></category>
            <dc:creator><![CDATA[Laurae]]></dc:creator>
            <pubDate>Thu, 25 May 2017 13:48:34 GMT</pubDate>
            <atom:updated>2017-05-25T13:48:22.528Z</atom:updated>
            <content:encoded><![CDATA[<figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*IdL4GFdpqNWBlJCd2N1uXQ.png" /></figure><p>Today, we got an interview with <a href="https://www.kaggle.com/dustinverzal">drverzal</a> about a brand new AMD Ryzen rig!</p><p><em>Is AMD Ryzen a good CPU? What can you do with it? What should you expect from it? We will have in a separate post a comparison against i7–7700K to check the performance difference!</em></p><h3>Interview Questions</h3><p>We are going to ask some questions to our new AMD Ryzen owner…!</p><h4>What is your new rig specs?</h4><ul><li><strong>CPU: AMD Ryzen 1700</strong></li><li><strong>GPU: NVIDIA 1080 Ti</strong></li><li>Motherboard: ASUS ROG Crosshair VI Hero</li><li>RAM: G.SKILL TridentZ RGB Series <strong>32GB</strong> (4 x 8GB) 288-Pin DDR4 SDRAM DDR4 2400</li><li>CPU Cooling: NZXT Kraken X62</li><li><strong>Hard Drive: M.2 Samsung EVO 960</strong></li></ul><h4>What are you aiming to do with your rig?</h4><p>I wanted my rig to be <strong>well-rounded</strong>. It’s not a monstrous computing machine, but it’s <strong>sufficient for exploring small to medium sized models</strong> in a timely manner.</p><h4>Why choosing AMD instead of Intel for this specific rig?</h4><p>From all the research that I did, the <strong>price-to-performance ratio of the Ryzen CPUs was astounding in multithreaded applications</strong>. Thus far, my benchmarks seem to agree.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*mRczyf7AhNSLRZLcLQuF9g.png" /><figcaption>drverzal results with a stock AMD Ryzen 7 1700 on Laurae’s xgboost benchmarks: 247 seconds on exact with 16 threads (8% faster than an overclocked i7–7700K at 5.0GHz), 400 seconds on fast with 6 threads (100% slower than an overclocked i7–7700K at 5.0GHz).</figcaption></figure><p>See here for the detailed benchmarks: <a href="https://medium.com/data-design/benchmarking-xgboost-with-and-without-virtualization-2357c3c64947"><strong>Benchmarking xgboost with and without virtualization</strong></a></p><h4>What would be equivalent pricing using Intel-only CPUs?</h4><p>Grabbing the two chips nearest my scores come in at <strong>$613 and $660</strong> whereas the <strong>Ryzen 1700 came in a $330</strong>.</p><h4>AMD Ryzen 1800X + 32GB RAM with 1Gbps network setups are available online for renting for $79.00/month online. Is it better to own your own desktop than renting?</h4><p>I utilize my machine for <strong>much more than only machine learning</strong>. I <strong>produce/write music</strong> where I utilize Ableton as my DAW (Digital Audio Workstation) and I enjoy a video game or two. To me, it was a better decision to buy.</p><h4>Advantages/Disadvantages of choosing AMD instead of Intel?</h4><p>The biggest advantage is undoubtedly the <strong>price-to-performance</strong>, with a large caveat, for multithreaded applications!</p><blockquote>Intel still takes the cake in single threaded performance.</blockquote><h4>What was your previous rig?</h4><ul><li>AMD: FX 8350</li><li>GPU: RX 480</li><li>RAM: 8GB</li></ul><h4>What would you prefer using a desktop or a laptop for data science?</h4><p>Personally, I’d go with a <strong>desktop</strong>. There’s nothing better to me than having a <strong>nice home office</strong>. That and having a <strong>multi-monitor setup is always enjoyable</strong>!</p><h4>How does AMD CPUs compare against Intel CPUs in raw performance?</h4><p>Cinebench R15:</p><ul><li>Singlethread: 127</li><li>Multithread: 1378</li></ul><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/0*-v3QeR2XbxXl-KVh." /><figcaption>Screenshot for Cinebench R15 score with AMD Ryzen 7 1700</figcaption></figure><blockquote>Comparison: we are hitting with multithread “only” 1000 with a i7–7700K.</blockquote><h4>Did you overclock? How far did you go? Was it easy?</h4><p>That was <strong>easy</strong>, 4.0 GHz on 8 cores!</p><blockquote>Comparison: i7–7700K can reach 5.0 GHz on 4 cores, but many reports online are showing dead CPUs.</blockquote><h4>How does AMD CPUs compare to your previous rig for raw machine learning speed?</h4><p>To compare scores: <a href="https://docs.google.com/spreadsheets/d/1sxzGshuqVtFe_2zgRhN3gXCraR7d8p-NazJ6z0nsGGc/edit#gid=0">Linus Tech Tips Cinebench R15 score list</a></p><p>Looking through the list of Cinebench scores from above, the highest overclock FX 8350 came in at #348 with a score of 842. <strong>The stock 1700 came in at 1378.</strong></p><p>There are <strong>two added benefits</strong> I’m feeling so far:</p><ul><li>One, <strong>small models are nearly instant</strong> and allow me to <strong>work without interruption</strong>;</li><li>Two, the <strong>reduction in training time for larger models</strong> mean I’m not waiting on results as long.</li></ul><p>Basically, I get <strong>results faster</strong> and more importantly, I can <strong>learn faster</strong>.</p><h4>Opinion on AMD CPUs = too fast too furious?</h4><p>I’m loving the Ryzen chip so far. As you might be able to tell from above, I’ve been <strong>supporting AMD</strong> for quite some time now.</p><p>I’m a big fan of <strong>keeping competition</strong> in there to keep Intel honest. It makes it a lot easier to support Team Red when they’re putting out <strong>killer products like the Ryzen lineup</strong>.</p><h4>What about RAM? Are you a multiprocessing or multithreading user?</h4><p>I upped to <strong>32GB of RAM for training larger models</strong>. I’m typically not building models on anything much larger than this at home.</p><h4>ANY COMPUTER PICS? FLASHY COLORS? VROOM VROOM NOISE? OVERCLOCKING MANIAC? WATERCOOLING MAGICIAN?</h4><p>I’m still <strong>waiting</strong> on the mounting bracket for my water cooler :*(</p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=4d409eec25e2" width="1" height="1" alt=""><hr><p><a href="https://medium.com/data-design/interview-amd-ryzen-as-a-workstation-4d409eec25e2">Interview: AMD Ryzen as a workstation</a> was originally published in <a href="https://medium.com/data-design">Data Science &amp; Design</a> on Medium, where people are continuing the conversation by highlighting and responding to this story.</p>]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Benchmarking xgboost with and without virtualization]]></title>
            <link>https://medium.com/data-design/benchmarking-xgboost-with-and-without-virtualization-2357c3c64947?source=rss----fe51a1842648---4</link>
            <guid isPermaLink="false">https://medium.com/p/2357c3c64947</guid>
            <category><![CDATA[benchmark]]></category>
            <dc:creator><![CDATA[Laurae]]></dc:creator>
            <pubDate>Thu, 25 May 2017 13:19:32 GMT</pubDate>
            <atom:updated>2017-05-25T13:21:23.092Z</atom:updated>
            <content:encoded><![CDATA[<figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*2Zr8gIYZPfKfp5hdW3Uvfw.png" /></figure><p>We have seen <a href="https://medium.com/data-design/benchmarking-new-xgboost-fast-histogram-xgboost-and-the-compiler-story-86f0c5a4bcd3">previously</a> that the <strong>xgboost had a new fast histogram method</strong> leading to blazing performance. All our tests were done on a virtualized environment. <strong>What if we compare it in the most unfair scenario?</strong>:</p><ul><li>Virtualized machine: Linux host, KVM virtualization, Windows client</li><li>Baremetal machine: Linux</li></ul><p>This is what we are going to do. We have access to <strong>two extra machines, thanks to </strong><a href="https://www.kaggle.com/yifanxie"><strong>Yifan Xie</strong></a><strong> (Intel machine) and </strong><a href="https://www.kaggle.com/dustinverzal"><strong>drverzal</strong></a><strong> (AMD machine) who helped for the benchmarking of xgboost exact and fast histogram</strong>:</p><ul><li>Intel i7–7700K overclocked 5.0/4.7GHz, 64GB RAM, baremetal Linux</li><li>AMD Ryzen 7 1700 3.7/3.2GHz, 16GB RAM, baremetal Windows</li></ul><h3>Benchmarking</h3><p>We are going to use the following to benchmark the three machines:</p><ul><li><strong>xgboost Exact</strong></li></ul><pre>gc(verbose = FALSE)<br>set.seed(11111)<br>StartTime &lt;- System$currentTimeMillis()<br>temp_model &lt;- xgb.train(data = xgb_data,<br>                       nthread = i,<br>                       nrounds = 50,<br>                       max_leaves = 255,<br>                       max_depth = 6,<br>                       eta = 0.20,<br>                       tree_method = &quot;exact&quot;,<br>                       booster = &quot;gbtree&quot;,<br>                       objective = &quot;binary:logistic&quot;,<br>                       verbose = 2)</pre><ul><li><strong>xgboost Fast Histogram (old version)</strong></li></ul><pre>gc(verbose = FALSE)<br>set.seed(11111)<br>StartTime &lt;- System$currentTimeMillis()<br>temp_model &lt;- xgb.train(data = xgb_data,<br>                        nthread = i,<br>                        nrounds = 200,<br>                        max_leaves = 255,<br>                        max_depth = 12,<br>                        eta = 0.05,<br>                        tree_method = &quot;hist&quot;,<br>                        max_bin = 255,<br>                        booster = &quot;gbtree&quot;,<br>                        objective = &quot;binary:logistic&quot;,<br>                        verbose = 2)</pre><p>Our <strong>xgboost tests consist on a training with the following parameter set on numeric Bosch full dataset</strong> (1,183,747 observations, 969 features, unbalanced dataset with 6,879 positive cases only).</p><p><strong>Think it is hard to compile xgboost? Not at all</strong>:</p><pre>devtools::install_github(&quot;Laurae2/ez_xgb/R-package@2017-02-15-v1&quot;)</pre><h4>Exact xgboost</h4><p><strong><em>tl;dr</em></strong>: baremetal wins.</p><p><strong><em>Normalization per thread comparison</em></strong>:</p><ul><li>Baremetal is faster overall.</li><li>AMD Ryzen is slower overall.</li><li>If we <strong>use AMD hyperthreading</strong>, our virtualized Intel machine gets smoked (in fact, the baremetal machine also gets smoked).</li></ul><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*97WVjiNd2wx4IN3ah8BK6w.png" /></figure><p><strong><em>Cumulated Normalization per thread comparison</em></strong>:</p><ul><li>Seen in this cumulated way, AMD is not that slow.</li><li>In fact, we would expect Intel to do much better but 47% higher clock for approximately 30% higher average faster time is clearly not that efficient.</li></ul><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*p7j7e077MiARluUBJHp5HQ.png" /></figure><p><strong><em>Detailed Data Chart</em></strong>:</p><ul><li><strong>Need details?</strong> Ranking is obvious: Baremetal Intel (Linux) &gt; Virtualized Intel (Windows) &gt; Baremetal AMD (Windows)</li></ul><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*Z6DvRCH78yXOMchzXeYe8Q.png" /></figure><h4>Fast Histogram xgboost</h4><p><strong><em>tl;dr</em></strong>: gcc 7.1 wins.</p><p><strong><em>Normalization per thread comparison</em></strong>:</p><ul><li><strong>With fast histogram, the GHz showoff starts.</strong> 35% higher clock rate for nearly 50% higher speed, isn’t it <strong>marvelous</strong>? (singlethread performance)</li><li><strong>AMD is nowhere coming next to Intel (yet).</strong> Keep in mind if you are looking to get faster training from exact xgboost, <strong>fast histogram xgboost will just do it 10x to 30x faster</strong> (or even more) on large datasets.</li></ul><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*aVw_EoAyz4Cr5EI0Fd66qQ.png" /></figure><p><strong><em>Cumulated Normalization per thread comparison</em></strong>:</p><ul><li>I think the conclusion is very easy to draw: the advantage of a virtualized Windows with Intel vs a baremetal AMD is 2/3 of the best scenario (baremetal Windows with Intel).</li></ul><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*8PxB8-SsPJCs9q5b-HU9Cw.png" /></figure><p><strong><em>Detailed Data Chart</em></strong>:</p><ul><li>I don’t think you can complain about doing 200 training iterations on Bosch in only 400 seconds (or less) these days.</li><li>Remember we are talking about training on 1,147,050,843 elements, and <strong>even a 90% sparsity would still make hundred of millions elements</strong>.</li></ul><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*XO1drU-oW4WqsTv3n7Cz3g.png" /></figure><h3>Conclusion</h3><p>Some simple key takeways:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*PKzfj8xxkwBHKWsNYLji7w.png" /></figure><ul><li>Use a <strong>baremetal machine if you want maximum performance.</strong> It does <strong>not really matter whether you want Linux or Windows</strong>, you already have plenty of performance.</li><li><strong>Fast Histogram xgboost is already plently in performance.</strong> And you can even get a larger performance using the new fast histogram!</li><li>Throwing <strong>more cores is not the ideal for fast histogram xgboost</strong>, while <strong>exact xgboost likes getting more cores</strong>.</li><li>When comparing two algorithms, take the same baseline. The picture shown before <strong>does not take into account the difference on the number of training iterations</strong>. You are actually doing <strong>4 times more iterations</strong> with fast histogram xgboost than exact xgboost, thus getting to the point of convergence. Also, the hyperparameters are <strong>RAM intensive</strong> for fast histogram xgboost (larger depth).</li></ul><p><strong>Previous post in this series</strong>:</p><ul><li><a href="https://medium.com/data-design/benchmarking-xgboost-5ghz-i7-7700k-vs-20-core-xeon-ivy-bridge-and-kvm-vmware-virtualization-293807a13f1c"><strong>Benchmarking xgboost: 5GHz i7–7700K vs 20 core Xeon Ivy Bridge, and KVM/VMware Virtualization</strong></a></li><li><a href="https://medium.com/data-design/benchmarking-xgboost-fast-histogram-frequency-versus-cores-many-cores-server-is-bad-8d333e9b0b27"><strong>Benchmarking xgboost fast histogram: frequency versus cores, many cores server is bad!</strong></a></li><li><a href="https://medium.com/data-design/exact-xgboost-and-fast-histogram-xgboost-training-speed-comparison-17f95cee68b5"><strong>Exact xgboost and Fast Histogram xgboost training speed comparison</strong></a></li><li><a href="https://medium.com/data-design/benchmarking-new-xgboost-fast-histogram-xgboost-and-the-compiler-story-86f0c5a4bcd3"><strong>Benchmarking new xgboost fast histogram: xgboost and the compiler story</strong></a></li></ul><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=2357c3c64947" width="1" height="1" alt=""><hr><p><a href="https://medium.com/data-design/benchmarking-xgboost-with-and-without-virtualization-2357c3c64947">Benchmarking xgboost with and without virtualization</a> was originally published in <a href="https://medium.com/data-design">Data Science &amp; Design</a> on Medium, where people are continuing the conversation by highlighting and responding to this story.</p>]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Benchmarking new xgboost fast histogram: xgboost and the compiler story]]></title>
            <link>https://medium.com/data-design/benchmarking-new-xgboost-fast-histogram-xgboost-and-the-compiler-story-86f0c5a4bcd3?source=rss----fe51a1842648---4</link>
            <guid isPermaLink="false">https://medium.com/p/86f0c5a4bcd3</guid>
            <category><![CDATA[benchmark]]></category>
            <dc:creator><![CDATA[Laurae]]></dc:creator>
            <pubDate>Sun, 14 May 2017 16:07:06 GMT</pubDate>
            <atom:updated>2017-05-14T20:33:42.437Z</atom:updated>
            <content:encoded><![CDATA[<figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*1EgosG6xf_xXrFgC0BQrgw.png" /></figure><p>We have seen <a href="https://medium.com/data-design/benchmarking-xgboost-fast-histogram-frequency-versus-cores-many-cores-server-is-bad-8d333e9b0b27">previously</a> that the <strong>new xgboost fast histogram method</strong> had an issue: it was <strong>awfully slow</strong>. But we fixed it. By recompiling R with gcc 7.1.</p><blockquote>How do you call someone compiling R from scratch in Windows?</blockquote><p><strong>Compiling R was <em>something tough</em></strong>, but I have now an <strong>executable I can use on all my servers to deploy R with gcc 7.1</strong> without any issue:</p><ul><li>Even better, all <strong>libraries are compiled with gcc 7.1</strong>!</li><li>It makes the <strong>new xgboost fast histogram fly</strong>!</li></ul><p>Therefore, we are going to <strong>benchmark two different things from xgboost</strong>:</p><ul><li><strong>xgboost old fast histogram</strong> with gcc 4.9 (Rtools) and gcc 7.1 (MinGW)</li><li><strong>xgboost new fast histogram</strong> with gcc 7.1 (MinGW)</li></ul><h3>Comparing xgboost old fast histogram with gcc 4.9 and gcc 7.1</h3><p>To compare the <strong>xgboost old fast histogram</strong> with <strong>different compilers</strong>, we will use:</p><ul><li><strong>R/xgboost compiled with gcc 4.9</strong></li><li><strong>R/xgboost compiled with gcc 7.1</strong></li></ul><p>And no, do not tell me to compile it with something else. It is already <strong>difficult enough to compile R in Windows</strong>.</p><h4>Intel i7–3930K: gcc 4.9 vs gcc 7.1</h4><p><strong><em>tl;dr</em></strong>: gcc 7.1 wins.</p><p><strong><em>Normalization per thread comparison</em></strong>:</p><ul><li>gcc 7.1 is the winner overall.</li></ul><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*FqLlZaybOIq8Qe9Si5rVsQ.png" /></figure><p><strong><em>Cumulated Normalization per thread comparison</em></strong>:</p><ul><li>gcc 7.1 clearly wins.</li></ul><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*J_4ak5RkCPMGPgmew63Xig.png" /></figure><p><strong><em>Detailed Data Chart</em></strong>:</p><ul><li>gcc 7.1 is the winner 11 times out of 12.</li></ul><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*dh8AKzQRpnpgGl6JAMdfVA.png" /></figure><h4>Intel i7–7700K: gcc 4.9 vs gcc 7.1*</h4><p><strong><em>tl;dr</em></strong>: gcc 4.9 wins but… (* read conclusion before making conclusions, there was a linux kernel version issue)</p><p><strong><em>Normalization per thread comparison</em></strong>:</p><ul><li>gcc 4.9 is the winner overall.</li></ul><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*ULhmTpAtXA3Hss562G-p1w.png" /></figure><p><strong><em>Cumulated Normalization per thread comparison</em></strong>:</p><ul><li>gcc 4.9 clearly wins.</li></ul><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*K8jfe3SDRaKQNgQTgLY50A.png" /></figure><p><strong><em>Detailed Data Chart</em></strong>:</p><ul><li>gcc 4.9 is the winner 100% of times (8 out of 8).</li></ul><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*mHidNxZqiaRr6OkBV7SSSw.png" /></figure><h4>Dual Quanta Freedom Ivy Bridge: gcc 4.9 vs gcc 7.1</h4><p><strong><em>tl;dr</em></strong>: gcc 7.1 wins.</p><p><strong><em>Normalization per thread comparison</em></strong>:</p><ul><li>gcc 7.1 is the winner overall.</li></ul><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*Yqa365fk-eRp8u70dhZb2w.png" /></figure><p><strong><em>Cumulated Normalization per thread comparison</em></strong>:</p><ul><li>gcc 7.1 learly wins.</li></ul><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*dJOFLiYhVtLIqAOrLBfCwg.png" /></figure><p><strong><em>Detailed Data Chart</em></strong>:</p><ul><li>gcc 7.1 is the winner 18 times out of 20.</li></ul><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*oqR93ThIQuL3ffz4p3AcXQ.png" /></figure><h3>Conclusion about gcc and xgboost old fast histogram</h3><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*MK4OgPfHLchTRVsRisbhTA.png" /></figure><ul><li>i7–3930K: gcc 7.1 won</li><li>i7–7700K: gcc 4.9 won*</li><li>20 core server: gcc 7.1 won</li></ul><p>In the case of the i7–7700K, I <strong>reinstalled the whole virtualization machine (host machine)</strong> which means it also <strong>changed the linux kernel</strong> (4.10 for gcc 4.9, 4.9 for gcc 7.1). Running the same benchmark with kernel 4.9 and gcc 4.9 <strong>leads to 7.1 winning 100%</strong>.</p><blockquote>gcc 7.1 won “all the times” against gcc 4.9.</blockquote><p>So the real conclusion would be… <strong>gcc 7.1 <em>“won all times”</em></strong> (if not losing a little bit somewhere).</p><h3>Comparing xgboost fast histogram: old vs new</h3><p>Now we will be interested into <strong>comparing xgboost fast histogram old and new versions</strong>. Will the new version reign supreme? This is what we will check.</p><p>You can install the used xgboost versions using the commands below:</p><ul><li>old xgboost fast histogram: devtools::install_github(&quot;Laurae2/ez_xgb/R-package@2017-02-15-v1&quot;)</li><li>new xgboost fast histogram: devtools::install_github(&quot;Laurae2/ez_xgb/R-package@2017-05-02-v2&quot;)</li></ul><p>I think I will not even have to comment, results are obvious.</p><h4>Intel i7–3930K: old vs new xgboost fast histogram</h4><p><strong><em>tl;dr</em></strong>: new fast histogram wins.</p><p><strong><em>Normalization per thread comparison</em></strong>:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*DewxgHNqAVxVVUCIKTHkmQ.png" /></figure><p><strong><em>Cumulated Normalization per thread comparison</em></strong>:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*Y6vUSQgMf75vhzq715Q1bA.png" /></figure><p><strong><em>Detailed Data Chart</em></strong>:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*A56LwkZnylO0_MEsb6qeOA.png" /></figure><h4>Intel i7–7700K: old vs new xgboost fast histogram</h4><p><strong><em>tl;dr</em></strong>: new fast histogram wins.</p><p><strong><em>Normalization per thread comparison</em></strong>:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*TyyWV42kH5MGE4wkflBxKA.png" /></figure><p><strong><em>Cumulated Normalization per thread comparison</em></strong>:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*CGzwY3a3cmy0HW_Uq3wBiQ.png" /></figure><p><strong><em>Detailed Data Chart</em></strong>:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*qM1Fz2ZSaf8I7L1PBjXThQ.png" /></figure><h4>Dual Quanta Freedom Ivy Bridge: old vs new xgboost fast histogram</h4><p><strong><em>tl;dr</em></strong>: new fast histogram wins.</p><p><strong><em>Normalization per thread comparison</em></strong>:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*lzZotmSHEDcMdofBDZdPwA.png" /></figure><p><strong><em>Cumulated Normalization per thread comparison</em></strong>:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*84i86cszv7tLmAxnV_SfdA.png" /></figure><p><strong><em>Detailed Data Chart</em></strong>:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*7gWD3lhbjwPGOwEcLizL5Q.png" /></figure><h3>Old vs New Fast Histogram: all servers together</h3><p><strong>Need to compare the performance visually with big charts?</strong> Here you are served:</p><ul><li><strong>i7–7700K is just the <em>“KING”</em></strong> (or the <em>“</em><strong><em>QUEEN</em></strong><em>”</em> if you want it that way)</li><li><strong>The new xgboost fast histogram is just smoking everything</strong></li></ul><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*4lnRzaHLH-8yyRmS12fJfQ.png" /></figure><p>Clearly, going over 1 thread is already providing a poor ROI (return on investment, but applied to CPU threads). For instance, on i7–7700K, you better do a 4-fold cross-validation using a parallelized cross-validation:</p><ul><li>Parallelized cross-validation: less than 5 minutes for doing 4 parallel trainings using 1 thread each.</li><li>Sequential cross-validation: about 12 minutes for doing a training one by one using 3 threads each (assuming you found out the sweet spot).</li></ul><blockquote>Did you ever wanted to get a <strong>cross-validation speedup</strong>?</blockquote><blockquote><strong>Assuming you have enough RAM, here you have it.</strong></blockquote><h3>Conclusion</h3><p><strong>VERY simple key takeways</strong>:</p><ul><li><strong>New xgboost fast histogram is crushing</strong> everything.</li><li><strong>1 thread new xgboost fast histogram is 75% faster</strong> than the old xgboost fast histogram.</li><li><strong>gcc 7.1 is approximately 3% faster</strong> than gcc 4.9 for xgboost fast histogram.</li></ul><p>Still using the old xgboost fast histogram? <strong>Switch to the new one!</strong></p><blockquote>But are you satisfied enough?</blockquote><p>We have <strong>ONE blog post which will follow this series</strong>:</p><ul><li><strong>Benchmarking Baremetal Linux vs Virtualized Windows: how slow are we? AMD Ryzen showing up!</strong></li></ul><p><strong>Previous post in this series</strong>:</p><ul><li><a href="https://medium.com/data-design/benchmarking-xgboost-5ghz-i7-7700k-vs-20-core-xeon-ivy-bridge-and-kvm-vmware-virtualization-293807a13f1c"><strong>Benchmarking xgboost: 5GHz i7–7700K vs 20 core Xeon Ivy Bridge, and KVM/VMware Virtualization</strong></a></li><li><a href="https://medium.com/data-design/benchmarking-xgboost-fast-histogram-frequency-versus-cores-many-cores-server-is-bad-8d333e9b0b27"><strong>Benchmarking xgboost fast histogram: frequency versus cores, many cores server is bad!</strong></a></li><li><a href="https://medium.com/data-design/exact-xgboost-and-fast-histogram-xgboost-training-speed-comparison-17f95cee68b5"><strong>Exact xgboost and Fast Histogram xgboost training speed comparison</strong></a></li></ul><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=86f0c5a4bcd3" width="1" height="1" alt=""><hr><p><a href="https://medium.com/data-design/benchmarking-new-xgboost-fast-histogram-xgboost-and-the-compiler-story-86f0c5a4bcd3">Benchmarking new xgboost fast histogram: xgboost and the compiler story</a> was originally published in <a href="https://medium.com/data-design">Data Science &amp; Design</a> on Medium, where people are continuing the conversation by highlighting and responding to this story.</p>]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Exact xgboost and Fast Histogram xgboost training speed comparison]]></title>
            <link>https://medium.com/data-design/exact-xgboost-and-fast-histogram-xgboost-training-speed-comparison-17f95cee68b5?source=rss----fe51a1842648---4</link>
            <guid isPermaLink="false">https://medium.com/p/17f95cee68b5</guid>
            <category><![CDATA[benchmark]]></category>
            <dc:creator><![CDATA[Laurae]]></dc:creator>
            <pubDate>Sat, 29 Apr 2017 23:25:39 GMT</pubDate>
            <atom:updated>2017-04-29T23:22:34.934Z</atom:updated>
            <content:encoded><![CDATA[<figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*quaUGrMu6V-v0Sm3KMyv_g.png" /></figure><p><strong>Did you ever wanted to compare <em>“unfairly”</em> Exact xgboost and Fast Histogram xgboost?</strong> Here you are served.</p><p><strong>How unfair our comparisons will be?</strong> We are using our results from our series:</p><ul><li><a href="https://medium.com/data-design/benchmarking-xgboost-5ghz-i7-7700k-vs-20-core-xeon-ivy-bridge-and-kvm-vmware-virtualization-293807a13f1c"><strong>Benchmarking xgboost: 5GHz i7–7700K vs 20 core Xeon Ivy Bridge, and KVM/VMware Virtualization</strong></a></li><li><a href="https://medium.com/data-design/benchmarking-xgboost-fast-histogram-frequency-versus-cores-many-cores-server-is-bad-8d333e9b0b27"><strong>Benchmarking xgboost fast histogram: frequency versus cores, many cores server is bad!</strong></a></li></ul><p>This post is mainly for the <em>“eyes”</em> of the reader.</p><h3>Comparison Setup</h3><h4>Hardware Virtualization</h4><p>Three servers with their best cumulated runs used:</p><ul><li>i7–3930K, 6 cores, 12 threads, 3.9/3.5GHz, VMware virtualization</li><li>i7–7700K, 4 cores, 8 threads, 5.0/4.7GHz, KVM virtualization</li><li>Dual Quanta Freedom Ivy Bridge, 20 cores, 40 threads, 3.1/2.7GHz, KVM virtualization, NUMA fully optimized</li></ul><h4>Software Setup</h4><p>Exact xgboost:</p><pre>gc(verbose = FALSE)<br>set.seed(11111)<br>temp_model &lt;- xgb.train(data = xgb_data,<br>                        nthread = i,<br>                        nrounds = 50,<br>                        max_leaves = 255,<br>                        #max_depth = 6,<br>                        eta = 0.20,<br>                        tree_method = &quot;exact&quot;,<br>                        #max_bin = 255,<br>                        booster = &quot;gbtree&quot;,<br>                        objective = &quot;binary:logistic&quot;,<br>                        verbose = 2)</pre><p>Fast Histogram xgboost:</p><pre>gc(verbose = FALSE)<br>set.seed(11111)<br>temp_model &lt;- xgb.train(data = xgb_data,<br>                        nthread = i,<br>                        nrounds = 200,<br>                        max_leaves = 255,<br>                        max_depth = 12,<br>                        eta = 0.05,<br>                        tree_method = &quot;hist&quot;,<br>                        max_bin = 255,<br>                        booster = &quot;gbtree&quot;,<br>                        objective = &quot;binary:logistic&quot;,<br>                        verbose = 2)</pre><h3>Benchmarking unfairly xgboost: Exact vs Fast Histogram</h3><p>Remember you are doing the <strong>comparison for yourself</strong> and to <strong>please your mind</strong>! (or maybe you really want to compare because you want to know…)</p><h4>i7–3930K: Best Runs</h4><ul><li>Unfair fast histogram xgboost is kicking exact xgboost as expected.</li></ul><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*KrTZiGaa__MyCyyrMASWUQ.png" /></figure><h4>i7–3930K: All Runs</h4><ul><li>Unfair fast histogram xgboost is kicking exact xgboost as expected.</li></ul><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*aOsI2v75IOOwPfcQ3qBl2Q.png" /></figure><h4>i7–7700K: Best Runs</h4><ul><li>Unfair fast histogram xgboost is kicking exact xgboost as expected.</li></ul><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*nrLBlHbiDaaCGsxL-7guCg.png" /></figure><h4>i7–7700K: All Runs</h4><ul><li>Unfair fast histogram xgboost is kicking exact xgboost as expected.</li></ul><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*N0yrA2D02wWrY-tkTxpK6w.png" /></figure><h4>Dual Xeon: Best Runs</h4><ul><li>Unfair fast histogram xgboost is kicking exact xgboost as expected.</li></ul><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*Nhb_W5o0bAvZ5ITqVNfLwA.png" /></figure><h4>Dual Xeon: All Runs</h4><ul><li>Unfair fast histogram xgboost is kicking exact xgboost as expected.</li></ul><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*qKorY9WF_J541xtjOnEBpg.png" /></figure><h4>All Together: Best Runs</h4><ul><li>Unfair fast histogram xgboost is kicking exact xgboost as expected.</li></ul><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*cUhd6Iy2eXIfSmfKr2SoQQ.png" /></figure><p><strong>Need more?</strong> We will have soon a comparison versus a Baremetal Linux with a i7–7700K, and we will be also able to compare with AMD Ryzen 7 1700!</p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=17f95cee68b5" width="1" height="1" alt=""><hr><p><a href="https://medium.com/data-design/exact-xgboost-and-fast-histogram-xgboost-training-speed-comparison-17f95cee68b5">Exact xgboost and Fast Histogram xgboost training speed comparison</a> was originally published in <a href="https://medium.com/data-design">Data Science &amp; Design</a> on Medium, where people are continuing the conversation by highlighting and responding to this story.</p>]]></content:encoded>
        </item>
    </channel>
</rss>