<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:cc="http://cyber.law.harvard.edu/rss/creativeCommonsRssModule.html">
    <channel>
        <title><![CDATA[Deep learning benchmark tool | DLBT - Medium]]></title>
        <description><![CDATA[DLBT is the first User friendly app to run on linux ubuntu to benchmark your hardware - Medium]]></description>
        <link>https://medium.com/deep-learning-benchmark-tool-dlbt?source=rss----44c245e1d229---4</link>
        <image>
            <url>https://cdn-images-1.medium.com/proxy/1*TGH72Nnw24QL3iV9IOm4VA.png</url>
            <title>Deep learning benchmark tool | DLBT - Medium</title>
            <link>https://medium.com/deep-learning-benchmark-tool-dlbt?source=rss----44c245e1d229---4</link>
        </image>
        <generator>Medium</generator>
        <lastBuildDate>Tue, 19 May 2026 12:15:02 GMT</lastBuildDate>
        <atom:link href="https://medium.com/feed/deep-learning-benchmark-tool-dlbt" rel="self" type="application/rss+xml"/>
        <webMaster><![CDATA[yourfriends@medium.com]]></webMaster>
        <atom:link href="http://medium.superfeedr.com" rel="hub"/>
        <item>
            <title><![CDATA[4x RTX 2080 TI with Quadro Nvlink | Performance Test]]></title>
            <link>https://medium.com/deep-learning-benchmark-tool-dlbt/4x-rtx-2080-ti-with-quadro-nvlink-performance-test-acc061dc9ad?source=rss----44c245e1d229---4</link>
            <guid isPermaLink="false">https://medium.com/p/acc061dc9ad</guid>
            <category><![CDATA[deep-learning]]></category>
            <category><![CDATA[rtx-2080-ti]]></category>
            <category><![CDATA[tensorflow]]></category>
            <category><![CDATA[machine-learning]]></category>
            <category><![CDATA[nvlink]]></category>
            <dc:creator><![CDATA[TECHNO PREMIUM]]></dc:creator>
            <pubDate>Mon, 08 Apr 2019 04:30:25 GMT</pubDate>
            <atom:updated>2019-08-27T16:41:07.607Z</atom:updated>
            <content:encoded><![CDATA[<h4><strong>TensorFlow CNN: ResNet-50 FP16 &amp; FP32</strong></h4><p>Deep learning benchmark 2019/ Tensorflow, Nvidia, Deep learning Workstation, THREADRIPPER</p><p><strong>Convolutional Neural Nets</strong>Docker container image <strong>TensorFlow:18.03-py2</strong> from NGC</p><h3>Hardware used:</h3><p>CPU — THREADRIPPER 1900</p><p>32 GB ram DDR4</p><p>4X RTX 2080 Ti with 2X Nvlink Quadro</p><p>EVGA 1600w</p><p>MSI carbon x399</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*4o38f02xp8GCiu-s-YSUYQ.jpeg" /><figcaption>Photo By: <a href="https://medium.com/@rubenRfernandez">Ruben Roberto Fernandez</a></figcaption></figure><h3>TEST rules: FP32 &amp; FP16</h3><p>1- 2x 2080 ti w/o nvlink</p><p>2- 2x 2080 ti w/ nvlink</p><p>3- 4x 2080 ti w/o nvlink</p><p>4- 4x 2080 ti w/ nvlink</p><p>5- 2x 2080 ti w/ nvlink 2x w/o nvlink</p><p>In this case, we test all possibilities</p><h3>Checking the nvlink status:</h3><pre>techno@dl:~$ nvidia-smi nvlink --status -i 0<br>GPU 0: GeForce RTX 2080 Ti (UUID: GPU-c8aa2ad3-943c-665e-90fc-c9af727289cc)<br>         Link 0: 25.781 GB/s<br>         Link 1: 25.781 GB/s<br>techno@dl:~$ nvidia-smi nvlink --status -i <br>Option &quot;-i&quot; is missing its value.<br>techno@dl:~$ nvidia-smi nvlink --status <br>GPU 0: GeForce RTX 2080 Ti (UUID: GPU-c8aa2ad3-943c-665e-90fc-c9af727289cc)<br>         Link 0: 25.781 GB/s<br>         Link 1: 25.781 GB/s<br>GPU 1: GeForce RTX 2080 Ti (UUID: GPU-31f2f22f-b288-01f6-c102-c9990658aebe)<br>         Link 0: 25.781 GB/s<br>         Link 1: 25.781 GB/s<br>GPU 2: GeForce RTX 2080 Ti (UUID: GPU-6be7a8ec-bc7f-9347-6d5c-5557e23d4b37)<br>         Link 0: 25.781 GB/s<br>         Link 1: 25.781 GB/s<br>GPU 3: GeForce RTX 2080 Ti (UUID: GPU-7a82b7e5-96b1-11aa-5413-82fcdca4554f)<br>         Link 0: 25.781 GB/s<br>         Link 1: 25.781 GB/s<br>techno@dl:~$</pre><p><strong>Working good — 25GB P2P so 50GB bidirectional — Ok</strong></p><p>Downloading the docker containers for the test: ( NGC containers, need docker installation and NGC account ( Login from a terminal to pull the images )</p><p>1- sudo docker run –runtime=nvidia –rm -it -v $HOME/projects:/projects nvcr.io/nvidia/tensorflow:18.03-py2</p><h3>Frameworks and Model used:</h3><p>Tensorflow 1.4.0</p><p>Cuda 9</p><p>Multi-GPU support utilizing the NCCL communication library for the CNN code</p><h3>Benchmark Results:</h3><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/0*kUhHMD4uCmVlmx6f.png" /></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/0*Fo29FLlI6LtxPkWu.png" /></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/0*RWYKBUUfgtVpWzDo.png" /></figure><h3>Conclusions:</h3><p>According to our tests, we can see that using Quadro Nvlink, we see an increase in the number of images that can be processed, the greatest impact is seen in a 4-card system, in which the connection of 2 Nvlink was made by pairs of cards.</p><p>In our opinion, the best configuration would be a workstation with 4x 2080 ti with 2 Quadros Nvlinks since we see an increase of 13% when using Nvlinks.</p><p>DLBT is our ( Deep learning benchmark tool), we make benchmarking easy, to download our free app for Linux, check here</p><p><a href="https://www.technopremium.com/">https://www.technopremium.com/</a></p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=acc061dc9ad" width="1" height="1" alt=""><hr><p><a href="https://medium.com/deep-learning-benchmark-tool-dlbt/4x-rtx-2080-ti-with-quadro-nvlink-performance-test-acc061dc9ad">4x RTX 2080 TI with Quadro Nvlink | Performance Test</a> was originally published in <a href="https://medium.com/deep-learning-benchmark-tool-dlbt">Deep learning benchmark tool | DLBT</a> on Medium, where people are continuing the conversation by highlighting and responding to this story.</p>]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Testing Hardware for Deep Learning with DLBT]]></title>
            <link>https://medium.com/deep-learning-benchmark-tool-dlbt/testing-hardware-for-deep-learning-with-dlbt-a7e43e00cfe9?source=rss----44c245e1d229---4</link>
            <guid isPermaLink="false">https://medium.com/p/a7e43e00cfe9</guid>
            <category><![CDATA[deep-learning]]></category>
            <category><![CDATA[artificial-intelligence]]></category>
            <category><![CDATA[computer-science]]></category>
            <category><![CDATA[machine-learning]]></category>
            <category><![CDATA[tensorflow]]></category>
            <dc:creator><![CDATA[TECHNO PREMIUM]]></dc:creator>
            <pubDate>Tue, 02 Apr 2019 03:15:56 GMT</pubDate>
            <atom:updated>2019-08-27T16:43:44.954Z</atom:updated>
            <content:encoded><![CDATA[<figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*k9IJfikCdPecjqOUikExww.png" /><figcaption>Photo By: Screenshot from real results</figcaption></figure><p>So you are building your new Deep Learning workstation to perform some state-of-the-art computations and run really deep and sophisticated models, but you are indecisive as to which GPU to go for, or you already have a set of GPUs that you are planning to use, but need to know just how efficient are these when compared to what’s out there. In this blog post, I plan to present to you an app that will solve both of these problems to you, with no cost associated.</p><p>Deep Learning is a field that requires some serious computational power, and by using a CPU, you might spend weeks training your model, while a strong GPU would finish the job during the day. This is mainly because of the difference between these two pieces of hardware regarding the design, as we shall see in a minute when we discuss the different types of HW used for Deep Learning, but for now it’s just good to bear in mind that more efficient hardware will mean not only faster training experiences, but also more room for model tuning and algorithms testing, that will make your life as a Deep Learning developer a lot easier.</p><h3>Types of Hardware</h3><p>If we are going to discuss what are the best pieces of hardware to perform deep learning tasks, we should first take a look at the different types, the following diagram shows the classification breaking it down to four classes.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/451/1*1hIPUvo3dg1iUy79gVprvg.png" /></figure><p>As we can see in the previous diagram, general-purpose hardware category splits into Central Processing Units (CPU) and Graphic Processing Units (GPU). The former is specifically designed to be latency oriented, this means it should be able to do complicated big tasks, one after the other, just like a big elephant. As for the GPU, this one is throughput oriented, which implies it specializes in performing many many small dumb tasks simultaneously, resembling a group of small ants.</p><p>Field Programmable Gate Arrays (FPGA) is a special piece of hardware that allows for programmable logic, this means that the developer can design the hardware structure of the device several times to implement a particular application. This might really come in handy if you want to try out new ideas and prototypes, and its performance increases relative to the general-purpose hardware as long as the design is efficient enough.</p><p>Application-Specific Integrated Circuits (ASIC) are much rarer to come by, it implies someone took the job of carefully designing the hardware that solves the problem at hand and printed the circuit, so this hardware would only make sense when used for that application. Google’s Tensor Processing Units (TPU) are a state of the art ASIC circuit. Although ASICs turn out to be faster than FPGAs, they are harder to obtain and assemble into our deep learning workstations.</p><p>The Deep Learning Bench Tools Application focuses on the General Purpose hardware, as it is by far the most repeatedly used.</p><h3>DLBT Application</h3><p>Suppose you just bought your Graphics Card(s) and plugged it into your motherboard, expecting to run some next level algorithms very fast. It would be very useful if you had a tool that told you how fast is the combination of your CPU with the Graphic Processing Units at your disposal, and that on top of it let you compare the results to other deep learning workstations around the globe to see if you’re happy where you stand. Well, look no more, DLBT is the answer.</p><p>This hardware bench tool automatically recognizes the Machine Learning capable hardware in your computer, this might be just the CPU, in case you have no GPU, or you haven’t installed the required drivers (if this is the case, we walk you through how to do this, line by line), or it may be multiple GPUs, in which case you have the choice of where to run the benchmark models.</p><h3>Model Used</h3><p>In its current version, the DLBT app is running a Convolutional Neural Net, with a standard structure in the background, while taking note of how long an episode lasts, as well as splitting this time into the prediction time and the back-propagation time for more advanced users.</p><p>The structure of the model used, might be seen in the following image.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/553/1*dN7WrG7CkqgD1a5u5RGd0w.png" /><figcaption>Convolutional Neural Net used in the Test Bench</figcaption></figure><p>As a future update, we are currently working on extending this feature into multiple known benchmarks having to do with Recurrent Neural Networks, Natural Language Processing, etc.</p><h3>Obtaining the rating</h3><p>How to measure exactly how effective is the device running? We use the formula displayed below. Intuitively, it would be better for the ratings to increase as the hardware efficiency rises. The K scaling factor serves the purpose of spreading the results more to allow for better comparison.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/141/1*gn4ViIpn_wd3nQwLte0fJg.png" /></figure><h3>Results</h3><p>This application has been run on many GPUs to measure their performance running the model explained previously, the following table depicts some of the results thrown by the app. In <a href="https://www.technopremium.com/results">here</a> you will find many more results from other pieces of hardware.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*YFC8lm58LWb4dBYkrtpSuw.png" /></figure><h3>Conclusions</h3><p>There you have it, you just discovered an easy way to measure your hardware performance, without writing a single line of code. <a href="https://drive.google.com/file/d/1gDMs3pLG-UzcEZ0AunnWzWHMwGdfqKxm/view?usp=sharing">DLBT</a> is a GUI application that automatically detects your GPUs, lets you monitor them and run deep learning benchmarks to compare their performance to the standards.</p><h3>App download:</h3><p>Anyone can download the app and test their hardware, <a href="https://technopremium.com/">check here</a></p><p><a href="https://technopremium.com/blog/">https://technopremium.com/blog/</a></p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=a7e43e00cfe9" width="1" height="1" alt=""><hr><p><a href="https://medium.com/deep-learning-benchmark-tool-dlbt/testing-hardware-for-deep-learning-with-dlbt-a7e43e00cfe9">Testing Hardware for Deep Learning with DLBT</a> was originally published in <a href="https://medium.com/deep-learning-benchmark-tool-dlbt">Deep learning benchmark tool | DLBT</a> on Medium, where people are continuing the conversation by highlighting and responding to this story.</p>]]></content:encoded>
        </item>
    </channel>
</rss>