Work remotely with PyCharm, TensorFlow and SSH

Wouldn’t it be awesome to sit at a café with your laptop, creating large neural networks in TensorFlow, crunching data with speeds of several terraFLOPS, without even hearing your fan spinning up? This is possible using a remote interpreter in PyCharm, and you get almost the same experience working remotely as working locally.

However, this is currently only possible in PyCharm Professional (Community Edition will not do). If you are a student your University should have an arrangement so you can download it for free, otherwise you’ll have to buy it. Here is how I set it up from scratch (you may want to skip some of the steps):

Remote data crunching machine

Hopefully your remote machine doesn’t look like this.

This is your stationary remote machine, perhaps fitted with one or several state-of-the-art GPU:s from Nvidia! (I don’t like the current deep learning monopoly, but TensorFlow can only use Nvidia GPUs). First let’s install the latest Ubuntu, I recommend the desktop version, you can always kill the GUI-service later to free up graphics memory. Connect it to Internet and check you LAN IP-address by opening up a terminal typing ifconfig. I will assume it is 192.168.0.1 in the instructions later.

Setup SSH

In order to be able to communicate with your crunching-machine, you need to install SSH on it. Open up a terminal on your stationary computer and get it:

sudo apt-get install ssh

Enable SSH X11-forwarding so that you can plot things, open the configuration file like this.

sudo gedit /etc/ssh/sshd_config

Then locate the row that says

# X11Forwarding yes

Simply remove the hash-sign to uncomment the line, save and close the file.

Graphics

Next install the graphics drivers, they are usually proprietary, so you need to add a new repository to your package manager. What package you’ll need depend on your graphics card and Ubuntu version. As of writing nvidia-367 is the latest one, see more on this page.

sudo add-apt-repository ppa:graphics-drivers/ppa
sudo apt-get update
sudo apt-get install nvidia-367

Cuda and cuDNN

Now it’s time to install Cuda toolkit and and cuDNN, which are required to run TensorFlow. They are available from Nvidia’s webpage, and to download cuDNN you are required to register. As of writing Cuda 8.0 and cuDNN 5.1 are the latest versions. For Cuda I prefer using the built in package manager, it makes it easier to keep track of what you have installed:

sudo dpkg -i cuda-repo-ubuntu1604_8.0.44-1_amd64.deb
sudo apt-get update
sudo apt-get install cuda-toolkit-8.0

Make sure that the symlink is set up correctly:

readlink -f /usr/local/cuda
>> /usr/local/cuda-8.0

This is how to extract the cuDNN headers and copy them into the Cuda folder, and make them readable in the terminal (some of the filenames may be different for you):

tar xvzf cudnn-8.0-linux-x64-v5.1.tgz
sudo cp cuda/include/cudnn.h /usr/local/cuda/include
sudo cp cuda/lib64/libcudnn* /usr/local/cuda/lib64
sudo chmod a+r /usr/local/cuda/include/cudnn.h /usr/local/cuda/lib64/libcudnn*

Finally add the environment variables you will need, append them to your .bashrc file and then source it:

echo 'export LD_LIBRARY_PATH=”$LD_LIBRARY_PATH:/usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64"' >> ~/.bashrc
echo 'export CUDA_HOME=/usr/local/cuda' >> ~/.bashrc
source ~/.bashrc

Python and TensorFlow

Install some required Python libraries:

sudo apt-get install python-pip python-dev build-essential python-numpy python-scipy python-matplotlib

And then install GPU enabled Tensorflow, check the version you need on this page (TF_BINARY_URL is different for different systems):

export TF_BINARY_URL=https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow-0.11.0rc2-cp27-none-linux_x86_64.whl
pip install --ignore-installed --upgrade $TF_BINARY_URL

Verify that the installation is working by typing the following in your terminal:

python
import tensorflow

You should get output similar to this if you have installed it on a GPU enabled system:

>I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcublas.so locally
>I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcudnn.so locally
>I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcufft.so locally
>I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcuda.so.1 locally
>I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcurand.so locally

Did it work? Great! Let’s move on to your laptop

Super sleek ultrabook

Open up your laptop and connect it to the same local network as your stationary machine.

Install stuff

So I’m using a Macbook and it allows me to install programs with a very nice package manager called Homebrew. Even desktop apps can easily be downloaded with Homebrew Cask.

Install Homebrew and Cask:

/usr/bin/ruby -e “$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"
brew tap caskroom/cask

Get what you need, including the PyCharm IDE.

brew install cask ssh-copy-id python
brew cask install java pycharm xquartz

Setup SSH

Generate a SSH key-pair by executing the command below and then walk trough the guide (if you haven’t done this already):

ssh-keygen -t rsa

Now copy the key to your remote machine so you can connect to it without typing a password every time. On the first time doing this you need to authenticate yourself with the password of your remote machine:

ssh-copy-id [remote username here]@[remote Ip here]

Enable compression and X11-forwarding (useful for plotting data) by appending this to your config file on your local machine.

echo 'ForwardX11 yes' >> ~/.ssh/config
echo 'Compression yes' >> ~/.ssh/config

Verify that everything is working by connecting to your remote machine from your laptop.

ssh [remote username here]@[remote Ip here]

While still logged in, you should disable password login on your remote machine for security reasons. Open the configuration file with your favorite command-line editor.

sudo vim /etc/ssh/sshd_config

And uncomment the following line by removing the hash-sign:

PasswordAuthentication no

Restart your SSH server while still logged in on your remote (you have to authenticate yourself again).

service ssh restart

The final thing you should do while still logged in with SSH on your remote is to find your display environment variable. This will be used later for plotting, I usually get localhost:10.0.

echo $DISPLAY
> localhost:10.0

Remember the output of this command, we will use it later.

Remote interpreter in PyCharm

This is the funny part, how we can set up the remote interpreter so you execute the scripts on your remote machine. Let’s get started, start up PyCharm and create a new Python project.

Interpreter

Open “Preferences > Project > Project Interpreter”. Click on the “Dotted button” in the top-right corner and then “Add remote”.

Click on the “SSH Credentials” radio-button and input your information. Select “Key pair” on the “Auth type”, and select the “Private Key file”. It should be located in /Users/<your username>/.ssh/id_rsa.

Click on “OK > Apply”. Notice the “R” for remote on the Project Interpreter.

Deployment

The remote interpreter can not execute a local file, PyCharm have to copy your source files (your project) to a destination folder on your remote server, but this will be done automatically and you don’t need to think about it! While still in the “Preferences” pane, open “Build, Execution, Deployment > Deployment > Options”. Make sure that “Create empty directories” is checked. This way PyCharm will automatically synchronize when you create folders:

Now go back to “Build, Execution, Deployment > Deployment” and click on the “Plus button”, select “SFTP” and give a name to your remote. Click on “OK”:

Set up the connection by first typing the IP of your remote in “SFTP host”, then selecting “Key pair” on the “Auth type”, and finally selecting the “Private Key file”. It should be located in /Users/<your username>/.ssh/id_rsa, as shown in the screenshot below. You may then click on “Test SFTP connection”. Given that you can successfully connect you should set up mappings. If you’d like you can click on “Autodetect” beside the “Rooth path”, it will then find the place of your home directory on the remote. All paths you specify after this will be relative to this home path. Then go to the “Mappings” tab.

As soon as you save or create a file in your local path, it will be copied to the “Deployment path” on your remote server. Perhaps you want to deploy it in a DeployedProjects/ folder as shown below. This will be relative to your “Rooth path” specified earlier, so the absolute deployment path will in our case be be /home/username/DeployedProjects/TestProject/:

Now we are finished with the preferences, click on “Apply” > “OK”, and then click “Tools > Deployment > Automatic Upload” and confirm that it is checked:

To do the initial upload, right-click on you project folder in the project explorer and click on “Upload to remote”:

You should get a “File transfer” tab on your bottom pane where you can see all the progress:

Then click on “Tools > Deployment > Browse Remote Host”. Drag and drop the window just beside the Project tab to the left. That way it will be really simple to switch between your local and remote project.

These deployment settings will work seamlessly as soon as you save and run a file, it is done so quickly you won’t even notice it.

Setup the Console

Open “Preferences > Build, Execution, Deployment > Console > Python console” and select the “Python interpreter” to be your remote one. Next click on the “Dotted button” and input the required environment variables that we added before to ~/.bashrc when we set up the server. Notice that we also added a value to the “DISPLAY” variable we found out earlier when connecting to the server with SSH:

Then go back to “Build, Execution, Deployment >Deployment > Console” and select “Always show the debug console”. It will be very handy when we’re debugging:

Create a run configuration

Create a simple test-file called test.py in your project, just containing this.

import tensorflow
print "Tensorflow Imported"

Now go to “Run > Edit Configurations…” Click on the “Plus button” and create a new Python configuration. Name it and select the script to run:

Now enter the required environment variables as before. Tips: You can copy them all from the console settings we specified earlier, by using Ctrl+A and then the copy/paste buttons in the lower left corner. You access them by clicking the “Dotted button” just to the right of the “Environment variables” line.

Click on “OK > OK”. It’s time for testing!

Testing the setup

Now we should be all done, it’s time to test our setup. First open a terminal and make sure that you have at least one SHH channel with X-forwarding connected your server. If you have had a connections open for a while, you may have to exit and restart them:

ssh [remote username here]@[remote Ip here]

Console

Then open the “Python Console” in the lower bar in PyCharm and type import tensorflow. Then you may type ls / to verify that you are actually executing the commands on your server! This is what the output should be:

Running script

Now go over to your test.py script and select “Run > Run…” from the top toolbar. Select your newly create run configuration “Test”. It should output something like this:

Plotting

Let’s do some plotting, change your test.py file to this:

import tensorflow
import matplotlib
matplotlib.use('GTKAgg')
import matplotlib.pyplot as plt
import numpy as np

print "Tensorflow Imported"
plt.plot(np.arange(100))
plt.show()

And then run it again with your run configuration “Test”, you should get this plot.

The plot is actually done on your remote server, but the window data is forwarded to your local machine. Notice that we changed the backed with matplotlib.use('GTXAgg'), because it’s a X11-supported display backend. You can read more about Matplotlib backends here. You can also change the default behavior in your matplotlibrc-file. Remember that you need to have at least one open SSH-connection in a separate terminal to get this to work, with the correct value of the DISPLAY environment variable. If it didn’t work try to restart your SSH connection.

Debugging script

Finally do some debugging, click on the left bar to put a breakpoint, then go “Run > Debug…” and select the “Test” configuration. You will see that the execution has halted and you are debugging your script remotely.

Next step

In order to access your machine over the internet you have to forward ports on you home router, that is different for different vendors. I recommend forwarding a different port than 22 on your router. There are plenty bots out there trying to hack in, and they will check that port by default, and might slow your connection (although you are pretty secure since you have turned of password authentication). So you could perhaps forward port 4343 on your router to port 22 on IP 192.168.0.1 (the default IP of our remote in this tutorial). Also to speed up the plotting you may change to a faster encryption.

Next, let’s do some more TensorFlow, perhaps experimenting with matrix multiplication on the CPU and GPU? (coming soon)