4x RTX 2080 TI with Quadro Nvlink | Performance Test

TECHNO PREMIUM — Mon, 08 Apr 2019 04:30:25 GMT

TensorFlow CNN: ResNet-50 FP16 & FP32

Deep learning benchmark 2019/ Tensorflow, Nvidia, Deep learning Workstation, THREADRIPPER

Convolutional Neural NetsDocker container image TensorFlow:18.03-py2 from NGC

Hardware used:

CPU — THREADRIPPER 1900

32 GB ram DDR4

4X RTX 2080 Ti with 2X Nvlink Quadro

EVGA 1600w

MSI carbon x399

Photo By: Ruben Roberto Fernandez

TEST rules: FP32 & FP16

1- 2x 2080 ti w/o nvlink

2- 2x 2080 ti w/ nvlink

3- 4x 2080 ti w/o nvlink

4- 4x 2080 ti w/ nvlink

5- 2x 2080 ti w/ nvlink 2x w/o nvlink

In this case, we test all possibilities

Checking the nvlink status:

techno@dl:~$ nvidia-smi nvlink --status -i 0
GPU 0: GeForce RTX 2080 Ti (UUID: GPU-c8aa2ad3-943c-665e-90fc-c9af727289cc)
         Link 0: 25.781 GB/s
         Link 1: 25.781 GB/s
techno@dl:~$ nvidia-smi nvlink --status -i 
Option "-i" is missing its value.
techno@dl:~$ nvidia-smi nvlink --status 
GPU 0: GeForce RTX 2080 Ti (UUID: GPU-c8aa2ad3-943c-665e-90fc-c9af727289cc)
         Link 0: 25.781 GB/s
         Link 1: 25.781 GB/s
GPU 1: GeForce RTX 2080 Ti (UUID: GPU-31f2f22f-b288-01f6-c102-c9990658aebe)
         Link 0: 25.781 GB/s
         Link 1: 25.781 GB/s
GPU 2: GeForce RTX 2080 Ti (UUID: GPU-6be7a8ec-bc7f-9347-6d5c-5557e23d4b37)
         Link 0: 25.781 GB/s
         Link 1: 25.781 GB/s
GPU 3: GeForce RTX 2080 Ti (UUID: GPU-7a82b7e5-96b1-11aa-5413-82fcdca4554f)
         Link 0: 25.781 GB/s
         Link 1: 25.781 GB/s
techno@dl:~$

Working good — 25GB P2P so 50GB bidirectional — Ok

Downloading the docker containers for the test: ( NGC containers, need docker installation and NGC account ( Login from a terminal to pull the images )

1- sudo docker run –runtime=nvidia –rm -it -v $HOME/projects:/projects nvcr.io/nvidia/tensorflow:18.03-py2

Frameworks and Model used:

Tensorflow 1.4.0

Cuda 9

Multi-GPU support utilizing the NCCL communication library for the CNN code

Benchmark Results:

Conclusions:

According to our tests, we can see that using Quadro Nvlink, we see an increase in the number of images that can be processed, the greatest impact is seen in a 4-card system, in which the connection of 2 Nvlink was made by pairs of cards.

In our opinion, the best configuration would be a workstation with 4x 2080 ti with 2 Quadros Nvlinks since we see an increase of 13% when using Nvlinks.

DLBT is our ( Deep learning benchmark tool), we make benchmarking easy, to download our free app for Linux, check here

https://www.technopremium.com/

4x RTX 2080 TI with Quadro Nvlink | Performance Test was originally published in Deep learning benchmark tool | DLBT on Medium, where people are continuing the conversation by highlighting and responding to this story.

Testing Hardware for Deep Learning with DLBT

TECHNO PREMIUM — Tue, 02 Apr 2019 03:15:56 GMT

Photo By: Screenshot from real results

So you are building your new Deep Learning workstation to perform some state-of-the-art computations and run really deep and sophisticated models, but you are indecisive as to which GPU to go for, or you already have a set of GPUs that you are planning to use, but need to know just how efficient are these when compared to what’s out there. In this blog post, I plan to present to you an app that will solve both of these problems to you, with no cost associated.

Deep Learning is a field that requires some serious computational power, and by using a CPU, you might spend weeks training your model, while a strong GPU would finish the job during the day. This is mainly because of the difference between these two pieces of hardware regarding the design, as we shall see in a minute when we discuss the different types of HW used for Deep Learning, but for now it’s just good to bear in mind that more efficient hardware will mean not only faster training experiences, but also more room for model tuning and algorithms testing, that will make your life as a Deep Learning developer a lot easier.

Types of Hardware

If we are going to discuss what are the best pieces of hardware to perform deep learning tasks, we should first take a look at the different types, the following diagram shows the classification breaking it down to four classes.

As we can see in the previous diagram, general-purpose hardware category splits into Central Processing Units (CPU) and Graphic Processing Units (GPU). The former is specifically designed to be latency oriented, this means it should be able to do complicated big tasks, one after the other, just like a big elephant. As for the GPU, this one is throughput oriented, which implies it specializes in performing many many small dumb tasks simultaneously, resembling a group of small ants.

Field Programmable Gate Arrays (FPGA) is a special piece of hardware that allows for programmable logic, this means that the developer can design the hardware structure of the device several times to implement a particular application. This might really come in handy if you want to try out new ideas and prototypes, and its performance increases relative to the general-purpose hardware as long as the design is efficient enough.

Application-Specific Integrated Circuits (ASIC) are much rarer to come by, it implies someone took the job of carefully designing the hardware that solves the problem at hand and printed the circuit, so this hardware would only make sense when used for that application. Google’s Tensor Processing Units (TPU) are a state of the art ASIC circuit. Although ASICs turn out to be faster than FPGAs, they are harder to obtain and assemble into our deep learning workstations.

The Deep Learning Bench Tools Application focuses on the General Purpose hardware, as it is by far the most repeatedly used.

DLBT Application

Suppose you just bought your Graphics Card(s) and plugged it into your motherboard, expecting to run some next level algorithms very fast. It would be very useful if you had a tool that told you how fast is the combination of your CPU with the Graphic Processing Units at your disposal, and that on top of it let you compare the results to other deep learning workstations around the globe to see if you’re happy where you stand. Well, look no more, DLBT is the answer.

This hardware bench tool automatically recognizes the Machine Learning capable hardware in your computer, this might be just the CPU, in case you have no GPU, or you haven’t installed the required drivers (if this is the case, we walk you through how to do this, line by line), or it may be multiple GPUs, in which case you have the choice of where to run the benchmark models.

Model Used

In its current version, the DLBT app is running a Convolutional Neural Net, with a standard structure in the background, while taking note of how long an episode lasts, as well as splitting this time into the prediction time and the back-propagation time for more advanced users.

The structure of the model used, might be seen in the following image.

Convolutional Neural Net used in the Test Bench

As a future update, we are currently working on extending this feature into multiple known benchmarks having to do with Recurrent Neural Networks, Natural Language Processing, etc.

Obtaining the rating

How to measure exactly how effective is the device running? We use the formula displayed below. Intuitively, it would be better for the ratings to increase as the hardware efficiency rises. The K scaling factor serves the purpose of spreading the results more to allow for better comparison.

Results

This application has been run on many GPUs to measure their performance running the model explained previously, the following table depicts some of the results thrown by the app. In here you will find many more results from other pieces of hardware.

Conclusions

There you have it, you just discovered an easy way to measure your hardware performance, without writing a single line of code. DLBT is a GUI application that automatically detects your GPUs, lets you monitor them and run deep learning benchmarks to compare their performance to the standards.

App download:

Anyone can download the app and test their hardware, check here

https://technopremium.com/blog/

Testing Hardware for Deep Learning with DLBT was originally published in Deep learning benchmark tool | DLBT on Medium, where people are continuing the conversation by highlighting and responding to this story.

Deep learning benchmark tool | DLBT - Medium