Jupyter Hub for Data Science team — Weekend of a Data Scientist

Alexander Osipenko
Sep 7, 2018 · 4 min read

Weekend of a Data Scientist is series of articles with some cool stuff I care about. Idea is to spend weekend by learning something new, reading and coding.

What is Jupyter Hub?

I assume that you already familiar with Jupyter, but in short, it’s pretty handy IDE that’s very popular in Data Science community, because it makes the process of sharing your findings much more easier. And it’s pretty customizable too! I made a review of my favorite Jupyter Lab extensions a few months ago.

Jupyter Hub is an environment that you can run on your remote server, which will give you a shared workspace for your Data Science team, where you will be able to run your code and share findings without painful process of setting up virtual environments and without dependencies conflicts (wow! amazing!).

When do you need Jupyter Hub?

You probably need it when:

  1. You have a team of Data Scientists with more than 1 person in it;
  2. When your team is tired of dependencies conflicts and different versions of environments;
  3. When you want to use Cloud sources within Jupyter Lab (or notebook), for example, run your code in GPU cluster.

How to setup Jupyter Hub on a remote machine?

As a compulsory requirement, you will need Python 3 on your server. You will be able to use Python 2.7 from withing Jupyter Lab, but you can run Jupyter Hub only with Python 3.

You can install it with 3 simple commands:

Now you can run Jupyter Hub with a simple command:

It will start Jupyter Hub with a default port.

I’ve had difficulties installing on Linux, and the only solution that worked for me is installing it with conda command.

The first command to run

Jupyter Hub allows you to generate a config file with a simple command:

this command will create python file jupyterhub_config.py that has almost everything you need inside and with pretty good instructions.

Some additional setup you may need:

Nginx

If you are using VPN you may need to configure nginx, because you want to make sure that Jupyter Hub is available for remote users, but by default, it runs on localhost.

Here is my piece of nginx config file.

Supervisor

I like to use supervisor as a process control system. Because if you connect through ssh to your server and start Jupyter Hub, unfortunately, it will stop as soon as your ssh connection will be closed.

Here is a piece of the config file for supervisor:

If it’s your first time hearing about Nginx and Supervisor, below I listed pretty useful tutorials on that.

Jupyter Hub authentication

For security sake, it is always good to use user authentication, luckily it’s more than easy with Jupyter Hub.

You will need to install oathenticator:

For example, you can setup authentication with GitHub account, simply add two lines in jupyterhub_config.py

login screen you will see in Jupyter Hub
It will redirect you to GitHub sigh in page

There are many different ways to authenticate users:

Here how Jupyter Hub looks like for admin

In jupyterhub_config.py you can setup admin users and they will have access to admin panel, where you can manage other people notebooks.

Simply manage users

In conclusion:

  1. Jupyter Hub is awesome and super handy for Data Science / Machine Learning team
  2. I allow you to create a shared environment and forget about the headache of dependency conflicts.
  3. It may be a tedious process to set up the first time, but it definitely worth it!

References:

  1. Jupyter Hub — https://github.com/jupyterhub/jupyterhub
  2. JupyterHub oathentication — https://github.com/jupyterhub/oauthenticator
  3. Nginx proxy configuration — https://www.nginx.com/resources/wiki/start/topics/examples/full/
  4. Supervisor configuration — https://github.com/illagrenan/ubuntu-supervisor-configuration

Did you try to use Jupyter Hub?


Cindicator

Cindicator is a fintech company that enables effective asset management through predictive analytics based on Hybrid Intelligence. Here we share our news & views on token economy, smart money, Black Swans, data analysis, AI, Machine Learning, and other topics.

Alexander Osipenko

Written by

Data Scientist passioned in Deep Learning and Time Series analysis

Cindicator

Cindicator is a fintech company that enables effective asset management through predictive analytics based on Hybrid Intelligence. Here we share our news & views on token economy, smart money, Black Swans, data analysis, AI, Machine Learning, and other topics.

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade