Setting up GTX-1070 Passthrough with ESXi
I’ve had to do this a sufficient number of times at this point that I figured I’d write a post about it.
Getting ESXi working isn’t too hard *if* you have the right hardware. For me (and for the LSI controller I bought to allow me to virtualize FreeNAS) it took a lot of trial and error. If I had it to do over again, I’d probably just buy a separate case/motherboard for myFreeNAS machine instead of virtualizing it.
In ESXi 5.5 VMware removed driver support not only for some commodity network cards, but also for lots of SATA…www.v-front.de
V-Front.de is the best website to figure out all of these things; for me, I ended up setting up my BIOS (and the BIOS of my LSI card) to boot off of the LSI card, and ended up passing the onboard AHCI controller of my motherboard to my FreeNAS instance.
Setting up the Image
Hopefully, without too much hair pulling out, you’ve managed to get ESXi up and running, and successfully set up a VM with access to the graphics card PCI slot (add other device > select the PCI slot for the GTX).
I called my machine ‘machine-learn-box’ and saved it in ‘datastore1’. After successfully creating the VM, we need to edit the *.vmx file for the instance. You can do this from the command line if you’ve enabled the shell/ssh on your server:
and add the line
hypervisor.cpuid.v0 = "FALSE"
But it’s easier to accomplish the same thing from the Edit Settings > VM Options > Advanced > Edit Configuration… menu.
This setting hides the virtualization from the guest OS, so it thinks it is running on bare metal (and it allows the Nvidia drivers to actually work unencumbered). This setting will only take effect with the VM powered down/on reset.
It seems to take a couple tries before Xubuntu would install owing to some nondeterministic preloaded Nvidia driver errors — but the third try worked for me. This is with Xubuntu 16.04.
sudo add-apt-repository ppa:graphics-drivers/ppa
sudo apt update
sudo apt install nvidia-384
After running these commands, you should see a table when you run the following command after restart.
this should be enough for something like an ethereum miner script to utilize the GPU.
Bonus: Install Tensorflow
Tensorflow requires a few more fun bits to actually get it to work. As of now, the most important thing to check is that you download the correct version of the CUDA library. Currently (March ’18), Nvidia is on 9.1, but that won’t work with tensorflow unless you download from source — we need 9.0 instead.
##install pip, vim, and virtualenv
sudo apt install python-pip
sudo apt install vim
pip install virtualenv virtualenvwrapper
echo 'export WORKON_HOME=$HOME/.virtualenvs' >> ~/.bashrc
echo 'export PROJECT_HOME=$HOME/Devel' >> ~/.bashrc
echo 'source $HOME/.local/bin/virtualenvwrapper.sh' >> ~/.bashrc
Let the fun begin…
##install cuda 9.0
# CNTRL+ALT+(FN)+F1 to close X-server, then at prompt
sudo service lightdm stop
sudo apt purge nvidia*
sudo sh cuda_9.0.176_384.81_linux-run
sudo service lightdm start
echo 'export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:/usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64"' >> ~/.bashrc
echo 'export CUDA_HOME=/usr/local/cuda' >> ~/.bashrc
Of course, to get *all* of the dependencies you need from Nvidia, you must sign up as a developer; aka, give them your email address, fill out a short survey, and “opt in” to receiving their emails (required). I’m excited for the coming ASICs that will hopefully give us a realistic option to take our business elsewhere. Until then…
Once you’ve logged in, clicked around, and forfeited your information to the Nvidia gods, you’ll eventually be bestowed with a download link. Be sure to choose cuDNN 7.0.* for CUDA 9.0.
##install cudnn 7.0
#check Ubuntu version
# Download cuDNN 7.0 with CUDA 9.0 support for our Ubuntu version
sudo dpkg -i libcudnn7_184.108.40.206-1+cuda9.0_amd64.deb
Once that’s installed, we should be ready to go!
##make a virtualenv
##install tensorflow and keras
pip install --upgrade tensorflow-gpu
pip install keras h5py
Now we should be able to run tensorflow. I’ve been using this program to test the GPU access:
import tensorflow as tf
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
Unfortunately for me, I get a ‘Illegal Instruction (core dumped)’ error when I try to import tensorflow (version=1.6). This is because I’m using an older CPU, so to fix it I needed to:
#fix Illegal Instruction (core dumped) error
pip uninstall tensorflow-gpu
pip install tensorflow-gpu==1.5
and it works!
A Note for Other Machines
It’s not uncommon to have to run tensorflow on a shared GPU cluster that might (unfortunately) get slightly out of sync with the proper dependencies. If you get an error when importing tensorflow, but you see a GPU when you check nvidia-smi, the right thing to do is to look in /usr/local for the versions of cuda that are installed, and to check /usr/local/cuda to see what has been symlinked to it. If you see cuda-8.0 and can’t install cuda 9.0, you can reference this as your base cuda library ($CUDA_HOME) and work with cuDNN 5.1 and tensorflow 1.4. Check here for which dependencies each version of tensorflow requires.
This is an example for when you find a root user has installed cuda-8.0, but not 9.0, which should work out-of-the-box:
echo 'export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:/usr/local/cuda-8.0/lib64:/usr/local/cuda-8.0/extras/CUPTI/lib64"' >> ~/.bashrc
echo 'export CUDA_HOME=/usr/local/cuda-8.0' >> ~/.bashrc
pip install tensorflow-gpu==1.4
pip install keras h5py
Good luck with your new machine learning setup!