Dynamic SSL Proxy For Jupyter Notebook

How to forward traffic to a dynamic target destination

Avi Turkewitz
May 1 · 5 min read

We are creating Docker containers for students to run a Jupyter Notebook and then embedding that notebook on Learn.co in an iframe. Because Learn.co is using SSL, the connection with the Jupyter Notebook must use SSL as well otherwise…

This is happening because when the user’s page loads, it makes a request to connect to a Jupyter Notebook. This request goes through a GeoDNS server, then a location-based load balancer (where the SSL is terminated), until it finally hits an instance of our app (Phoeyonce) that will create a Docker container running a Jupyter Notebook for them. The app then sends back the server and port that the user’s Docker container is running on so that the user’s webpage can connect directly to the Jupyter Notebook.

As you can see from the diagram though, our app was in charge of terminating the SSL certificate. Now, we no longer have a secure connection when the client connects directly to its Jupyter Notebook container. Not good.

No big deal, why not just…

Why not just follow the same path to the server as you did the first time?

The problem with trying to connect through the GeoDNS and load balancer again is that we cannot be sure that we’ll end up at the same server. Now we need to get back to the same server and port because it’s already running our Jupyter Notebook. The problem here is that it’s not the load balancer’s job to send us back to the same place, but to send us to the server with the most available resources. Right from the start, it was clear that this solution really wasn’t going to work.

Why not just have the app that sets up the Jupyter Notebook terminate the SSL cert?

Doing this would require that each Docker container have a copy of the certificate. This would end up exposing the certificate to any user that knew to dig around and look in the right place. That’s definitely not something we want to allow.

Our Solution

Our solution was to set up a proxy server that would terminate the SSL and then forward the connection to the correct server. For this to work, we needed every request sent to the Jupyter Notebook to go through our ide-proxy.ide.learn.co proxy server and have it contain the address of the server that the user’s container was already running on along with the port. This setup looks something like this:

This seemed like a great job for query params! We tried out doing something like ide-proxy.ide.learn.co/notebook?server=nyc-01&port=6578. Unfortunately, we ran into quite a large problem when the notebook started loading on the page and started requesting more assets from the container. When the notebook tried to request additional assets, it did not know that it needed to add these particular query params to the end of each request. After trying for some time to add these query params to all requests made by the notebook (or to the header of all requests made by the notebook), we realized this was not a viable solution.

The solution that we landed on was using a particularly formatted sub-domain to act as the server and port identifier!

We constructed the URL sent back from the initial request to look something like servername-port.ide-proxy.ide.learn.co (e.g. nyc-01–6578.ide-proxy.ide.learn.co). Because the nyc-01–6578 is a subdomain, the request still came into the proxy server we set up at ide-proxy.ide.learn.co. We were able to accomplish this by making the new GeoDNS a Wildcard DNS. Now all requests made by Jupyter Notebook will, in some way, contain the info to find the exact server and port that it’s running on.

The next step was to extract this oddly constructed subdomain at our proxy server to forward the request to the right place. To do this, we configured our nginx.conf to look as follows:

http {
  server {
    listen 443 ssl;
    # generic ssl cert settings...

    server_name ~^(?(.*?))-(?\d+).ide-proxy.ide.learn.co$;

    location / {
      proxy_pass http://$phoeyonce_host.ide.learn.co:$jupyter_port;
      proxy_read_timeout 300s;
      proxy_set_header Host $host;
      proxy_set_header X-Real-Ip $remote_addr;
      proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;

I think the trickiest bit here is the regex at server_name — ~^(?(.*?))-(?\d+).ide-proxy.ide.learn.co$ to assign phoeyonce_host as nyc-01 and jupyter_port as 6578.

We can then use those variables in proxy_pass to forward all traffic to the correct location!

This setup allows the ide-proxy.ide.learn.co proxy server to terminate the SSL cert, forward traffic to the correct server and port for a user’s Jupyter container and allow all subsequent requests to securely follow the same path!

Thanks for reading! Want to work on a mission-driven team that loves finding creative solutions to dev ops challenges? We’re hiring!

Footer top

To learn more about Flatiron School, visit the website, follow us on Facebook and Twitter, and visit us at upcoming events near you.

Flatiron School is a proud member of the WeWork family. Check out our sister technology blogs WeWork Technology and Making Meetup.

Footer bottom

Flatiron Labs

We're the technology team at The Flatiron School (a WeWork company). Together, we're building a global campus for lifelong learners focused on positive impact.

Avi Turkewitz

Written by

Former student and Software Engineer @ The Flatiron School. Learning Ruby / Rails / JS / Elixir / Phoenix

Flatiron Labs

We're the technology team at The Flatiron School (a WeWork company). Together, we're building a global campus for lifelong learners focused on positive impact.