Use Your AMD GPU to Speed Up Training (ROCm)
Before proceeding further, Following is my system specification,
CPU: Intel® Core™ i5–7400 CPU
GPU: AMD RX570 8GB
OS: Ubuntu 20.04.1 (kernel Linux MS 5.8.0–55-generic)
Note: this will not work for windows! Only Linux Based
Lately, i have been using google colab a lot for training the and testing of LSTM model. So i was taking the advantage of GPU run type. But since Colab provides limited number of GPU session in specific duration, i was not able to do the consecutive testing of models. Before i train another model i have to wait for some duration.
Apart from that i wanted a powerful laptop, since i can not afford that yet so i was looking for an alternate. Luckily my brother has gaming pc and has RX570 8GB in it. But that is AMD. However, like there is CUDA for NVIDIA there is ROCm for AMD.
Installations steps are found at following link;
https://rocmdocs.amd.com/en/latest/Installation_Guide/Installation-Guide.html
Following are the steps that i have followed, (those are the same which are found on above website, but these are the one which i have followed exactly)
Step 1: Add User to video and render (only if Ubuntu 20.04.1) group.
After that running “groups” you will see following output.
Step 2: Now install ROCm
Run following Commands one by one
Step 3: After that reboot the system.
Now to confirm the installation run following commands,
Step 4: Update Path
If ROCm is installed properly you can run “rocm-smi” command and you will see following output.
Step 5: Install Tensorflow for ROCm
Run following commands on by one.
Fix Issues
So, now all the things are installed properly, But when you run any tensorflow codes it will through an error. So to check that Run following python script and see the output.
It will give some error, inwhich it will say “Segmentation Fault”
So to Solve that, we have to reinstall “rocRAND”, we can follow the method provided at https://github.com/xuhuisheng/rocm-build/tree/master/gfx803.
Run following, to solve that issue. If you still not able to fix that then please check above github and ask author about it. While i was writing this author had updated repository 11 days ago. Make sure to install “git” and setup git global.email and global.name variable, Otherwise it will not download the source code of rocRAND.
Now run that python script again and it, should give following output.
Now, you can start using this gpu for deep learning.
I was training and LSTM with [400,100,100,100] and it was taking about 25–30 minutes to complete one epoch. Batch Size = 128 and total number of rows are 110000. And there are 10 Features.
After using this GPU it reduced to 160 seconds!
So this is very useful, Hope you find it too!
Thanks for reading till end, if any questions or confusion related to this please comment.
References:
https://rocmdocs.amd.com/en/latest/Installation_Guide/Installation-Guide.html