Fast calculations: how to use IPython Notebook on Amazon EC2

Not so far I got a task, where I had to run several Jupyter Notebooks and analyze the results. It was some kind of test quiz for a job position that’s why I couldn’t use my work servers. Total size of test data was about 7GB, but I used an old Macbook Air (2011) with 4GB RAM and it couldn’t handle all this data. It was time I decided to try Amazon spot instances for fast model fitting.

There are a lot of tutorials how to launch Jupyter on Amazon but it could be a little bit challenging for newbies. In this article I tried to explain everything step-by-step.

  1. Open and go to Services -> EC2 -> Spot Requests
  2. Here you can see a blue button “Request spot instances” (Spot instance is a server, where you can set a max price for an hour. It’s very useful when you don’t need to run server 24/7 but want to calculate something and “kill” it after all)
  3. Below I’ll explain all options step by step:
  4. Request type: Request
  5. Target capacity: 1
  6. AMI: Ubuntu Server 14.04
  7. Instance type: c3.4xlarge (it’s okay for test purposes)
  8. Allocation strategy: Lowest Price
  9. We don’t change Network and Availability Zone
  10. Maximum price: choose “Set your max price (per instance/hour)” (you’ll see current average price) and then set your maximum (the more is the price, the more stable will be the instance) and press “Next” button
  11. EBS volumes: set 30GB
  12. Create new key pair and download it to you computer (e.g. my_key.pem)
  13. Create new security group and set “Custom TCP ruleto 8888, also set HTTP, HTTPS, SSH to “my ip”
  14. Press “Preview” and then “Launch”

Open terminal and type (change <HOST> to the address of your instance)

chmod 400 my_key.pem
ssh -i my_key.pem ubuntu@<HOST>

If everything is ok, you should be on server. Let’s install Miniconda — the light version of Anaconda

wget -c bash -b -p ~/miniconda export PATH=~/miniconda/bin:$PATH

For example, we want to install a few popular ML libraries

conda install -y numpy scipy pandas scikit-learn jupyter

Then type the following command for launching our Jupyter notebook using password

openssl req -x509 -nodes -days 365 -newkey rsa:2048 -keyout my_key.pem -out my_key.pem

Now we should create password for entering Jupyter notebook

from IPython.lib import passwd

You’ll get the hash (e.g.: ‘sha1:a096208d6c69:7c077f88031686abb2656c627ec7060dd8d47aa0’) — copy it.

Type the following command in terminal (change hash to yours)

ipython profile create nbserver
printf “\n# Configuration file for ipython-notebook.\n
c = get_config()\n
# Notebook config\n
c.NotebookApp.ip = ‘*’\n
c.NotebookApp.password = u’sha1:a096208d6c69:7c077f88031686abb2656c627ec7060dd8d47aa0'\n
c.NotebookApp.open_browser = False\n
c.NotebookApp.port = 8887\n” >> /home/alexeye/.ipython/profile_nbserver/

Thats’ it! Now you can launch your jupyter notebook on the server

ipython notebook — config=”~/.ipython/profile_nbserver/” — certfile=jupyter.pem

Go to <HOST>:8888 (<HOST> is the address of your instance) and type your password (not hash, but your pass phrase). Now you can work on your ML problems in the cloud.