Setting up SSH tunnelling for your Jupyter and PyCharm
Increasing the productivity of your working-from-home environment
TL;DR, here’s my situation:
- I am working with python and I need to access a GPU server in my lab.
- Since this pandemic started, I’ve been working from home.
- The server is located behind a private network that you need to ssh to a proxy server before another ssh to the GPU server. That requires two times authentications.
- I want to develop the codes locally with my PyCharm, deploy the codes directly to the server and use a browser to open a Jupyter notebook to debug, visualise and experiment with my codes.
Here’s the recipe that I setup a ssh tunnelling, to enable me to write, deploy and debug my codes without entering passwords.
SSH Config
Usually I need to run this command to connect to my gpu-server
through a proxy-server
:
$ ssh -L 8888:localhost:8888 -J username@proxy-server.mycompany.com username@gpu-server.mycompany.com
The -J
option enables the jump proxy argument. The -L
option creates a tunnel for port 8888 in my local computer to port 8888 to the gpu-server
. This port is used by Jupyter so that I can open my notebook in the gpu-server
from my local computer. However, this command will prompt two times password authentications, one for the proxy-server
and the other one for the gpu-server
.
And it is horrible to remember that long command line. Alternatively, you can create a nice configuration setting.
If you don’t have SSH config file yet, you can create an empty one:
$ touch .ssh/config
Edit the config file and create the following section:
Host gpu-server
Hostname gpu-server.mycompany.com
User username
ProxyJump username@proxy-server.mycompany.com
ServerAliveInterval 30
ServerAliveCountMax 3
LocalForward 8888 localhost:8888
I included ServerAliveInterval
and ServerAliveCountMax
to keep the SSH connection alive when there is no activity while I’m coding.
With this configuration, I can just call the following command:
$ ssh gpu-server
but you still need to enter your password twice.
SSH Keys
To bypass authentication, you need to store SSH key pair both in the proxy and gpu servers. We will use ssh-keygen
tool to create the key pair.
— Warning — : The ssh-keygen
will create or overwrite the~/.ssh/id_rsa
file. So if you have created a key pair file before, calling this command will destroy your previous key pairs.
On your local terminal, run:
$ ssh-keygen -t rsa
It will ask paraphrase if you want to create an encrypted password, but this is optional. I did not use paraphrase.
Now what you need to do is to transfer your public key to both proxy-server
and gpu-server
. We will do it differently, as the proxy server can be reached directly, but not the gpu server.
Copying public key to the proxy-server
Use ssh-copy-id
to the proxy server. From your local terminal:
$ ssh-copy-id username@proxy-server.mycompany.com
This will copy and transfer your ~/.ssh/id_rsa.pub
to proxy-server
and save it as ~/.ssh/authorized_keys
.
Now you should be able to ssh to the proxy server without password. Try:
$ ssh username@proxy-server.mycompany.com
Copying public key to the gpu-server
We do it again for the gpu-server
, but differently. From your local terminal:
$ cat ~/.ssh/id_rsa.pub | ssh -J username@proxy-server.mycompany.com username@gpu-server.mycompany.com "mkdir -p ~/.ssh && chmod 700 ~/.ssh && cat >> ~/.ssh/authorized_keys"
Now you can try SSH to your gpu-server using config file:
$ ssh gpu-server
Voila!! No passwords.
Starting Jupyter / Jupyter-Lab server
Here’s what you need to install on your gpu-server:
- Create a python environment, either using
pip
oranaconda
. - Within your environment, install
jupyter
orjupyterlab
, and other python packages you need for your work.
Assuming you have done the above steps, you can try run jupyter to test whether you can reach it from your local computer.
Login to the gpu-server
and activate your python environment. Then call:
[gpu-server] $ jupyter notebook --no-browser --ip=0.0.0.0 --port=8888
Ignore the link that the jupyter outputs. Open your local browser and go to http://127.0.0.1:8888/.
Voila!! Your localhost on port 8888 has been forwarded to the gpu-server
using your ssh tunnel.
To run jupyter in the background, use the nohup
command:
[gpu-server] $ nohup jupyter --no-browser --ip=0.0.0.0 --port=8888 &> /tmp/jupyter.out &
You can see info and other debug messages from jupyter in /tmp/jupyter.out
file in the gpu-server
.
IMPORTANT NOTE: The ssh tunnel will only open and active if you ssh to the gpu-server
using your config file. As soon as you logout, the tunnel disappears. You may want to use some other tool to run ssh tunnel in the background.
Setting up PyCharm to auto-deploy
One nice feature of PyCharm is the auto deployment to a remote server whenever you save a code. You can also sync a python interpreter with the remote server, but I prefer to use local python interpreter to reduce traffic during indexing and searching of some functions.
- From a project, open PyCharm preferences dialog and go to
Deployment
underBuild, Execution, Deployment
menu on the left sidebar. - Create a new connection, give any name you want.
- Use SFTP type
- In the SSH configuration, click the triple dots (…). This will pop up SSH Configurations dialog box.
- Add a new configuration and set host (
gpu-server
), port (22), username, and set authentication type as “OpenSSH config and authentication agent”. - Test connection
- Back to the new deployment connection window, click on Mappings tab to determine the deployment path. If your connection has been setup properly, then you can browser remote server path.
You can now switch on Automatic Upload under Tools → Deployment menu. Every time you save, the file will be deployed automatically to the gpu-server
.
Happy coding!!