Jupyter Notebooks on AWS EC2 in 15 (mostly easy) steps


My need to run Jupyter Notebooks on EC2 came from needing more powerful resources for training a machine learning models for a Kaggle challenge. On my personal MacBook Pro it would take almost a month, but on 16 core cloud machine, it would take 5–7 hours. Fortunately, there were some resources available online to guide me and I’d like thank Chris Albon for his incredibly helpful guides on which this one is based. Hopefully, this guide will save people time so they can get coding faster!

  1. Use your current Amazon user id and password or create an IAM user with your existing login.
  2. Follow the default settings to create an EC2 instance and choose the Amazon Linux OS. For now, just select the free-tier t.2 micro as your instance type. You can change this in the future to something more powerful that fits your project’s needs.
  3. Instead of clicking “Review and Launch” button right away, click the “Next:” button until you get to Security Groups. Under Security Groups, select an exiting group or create a new one then open the inbound port 8888. Port 8888 is what we’ll use for the Jupyter Notebook server towards the end of this tutorial. “SSH-ing” into port 22 should already be set to open by default.

4. Follow the rest of the prompts using whatever the default values are and then finally click “Launch.” At this point, you should be prompted with some security key options. You can use an existing key or download a new one. Let’s assume you don’t have one yet. The PEM file is a key that AWS will check when you try to access (or SSH) into your EC2 instance from your local computer’s terminal.

Select the option to create a new one and give it an easy to remember name (all lower case with no spaces for ease of typing too). Download the PEM file and put it in an easy to reach location like your home folder. Now you can finally launch your EC2 instance by selecting… you guessed it… Launch Instance.

5. Once the EC2 instance is up (usually take a minute or less), SSH into your EC2 instance from your terminal by typing:

ssh -i "tutorialexample.pem" ec2-user@ec2-54-144-47-199.compute-1.amazonaws.com

This assumes your PEM file is in the same directly as your present working directory (type “pwd” into your terminal if you’re unsure what your present working directory is). Also you’ll need to type in your own user@ec2-instance address. If you’ve been following my steps so far the user is “ec2-user” (this is the default) and the address after “@” is your instance’s Public DNS (IPv4) address which you can view by selecting your instance (see screenshot below). If you get an error that your PEM file is not publicly viewable, you made need to execute this command:

chmod 400 tutorialexample.pem

Alternatively, you can click the Connect button next to Launch Instance (see above) and AWS will give you your instance specific instructions for SSH-ing in (see below). Thanks AWS!

Once you execute the SSH command, you’ll be prompted with a yes/no question. Type yes and you should be SSH-ed into your instance! (see screenshot below).

6. Give yourself a big pat on the back. You’re now running a virtual machine in the cloud!

7. Download Anaconda 3 installer by typing this command:

wget https://repo.continuum.io/archive/Anaconda3-4.4.0-Linux-x86_64.sh

Note: If you’re reading this tutorial in the distant future, you can get the exact link to the latest version on Anaconda3 by going to https://www.continuum.io/downloads and copying the link to the latest command line installer (see below).

8. Install Anaconda3 by typing:

bash Anaconda3-4.4.0-Linux-x86_64.sh

You’ll have hit enter to get thru all the legalese, but eventually you need to type yes to agree and then just hit enter to install Anaconda3 into the default directory. Once it starts installing, you’ll see all the packages included in Anaconda3 also being installed. Anaconda3 gives you everything you’ll need to get your Jupyter Notebook running. At the end you’ll be prompted to include Anaconda3 into your .bashrc PATH. Make sure to type “yes” (see below).

IF YOU TYPED YES TO ADD THE PATH (as instructed) THEN SKIP TO STEP 9

If you accidentally hit enter before typing “yes”, it will default to “no.” To correct this you’ll have to manually type the PATH into your .bashrc file. To do this type:

vim .bashrc

Your screen will now be in vim mode. If you’ve never used vim before you might feel like you’ve entered an alternate reality. Basically, yes. Vim is a text editor for your terminal and is one of the greatest tools to learn and master is you plan on doing a significant amount of hacking. But let’s focus on the task ahead of us! You need to add the text:

export PATH="/home/ec2-user/anaconda3/bin:$PATH"

First you’ll need to get to the bottom of the file. This can be achieved in a number of ways, but the most straight forward is to just hit the down arrow key until you ready the bottom (I encourage you to learn more efficient ways to getting to the bottom of a file outside of this tutorial). Now, just hit the “i” button and you’ll be in INSERT mode (think EDIT mode) which will allow you type and edit the file and you can also copy and paste with COMMAND-c and COMMAND-v. After typing in the path your file should look similar to this:

To save your edits and exit out of vim hit ESC (this take you out of INSERT mode) and then type “:wq” (stands for write-quit) and hit ENTER which will bring your back to your EC2 command line. If you fuck this up, no worries. You can exit vim anytime without saving by ESC-ing out of whatever mode you’re in and typing “:q!” and then just vim back in. Good luck!

9. Set Anaconda3 as your default Python environment. You’ll notice that your EC2 instance is configured to use the system’s Python 2.7. This is fine if you want to use 2.7 for all your projects, but you really should be using Python 3 for future projects which is why we installed Anaconda3 so we can switch to using the latest stable version of Python 3 instead of the default 2.7. To switch your environment to use Python 3 type the command:

which python /usr/bin/python

then type

source .bashrc

it should look like this:

Now if you type “python” you’ll see that you’re using Python 3.6.1 |Anaconda 4.4.0 (64-bit)! To get out of the Python 3 REPL just hold CONTROL then hit “d” or type “quit()”.

10. Create your Jupyter/Ipython password. In this step, we create a password to access out Jupyter Notebook from the web (remember that Jupyter runs notebooks on a server which anyone can access with an internet browser, so we need to set a password to prevent any unauthorized access to our notebooks). First we access the Ipython console by typing:

ipython

Now we want to create a password so we import the password module and type in our password to generate an SHA hashed version. You’ll need the remember the actual password you typed in and copy the SHA version for use in the next step. Type:

from IPython.lib import passwd

then

passwd()

You’ll be prompted to type in your password and then verify. After that you’ll see an SHA version and then you can exit by typing “exit.” Make sure to save your SHA hash for future reference by copying and pasting it to a text file, you’ll need this later. It should look something like this:

11. Configure Jupyter/Ipython server to access your notebooks from your local computer via your internet browser. First we’ll create a default config file by just typing:

jupyter notebook --generate-config

Next, we’ll need to generate SSL certificates so our browser will trust our Jupyter Notebooks server (*sigh* I know). Luckily, this is pretty straightforward. Just type:

mkdir certs

then go into your certs directory by typing:

cd certs

then create your PEM file (this is separate from the PEM file on your local computer which we downloaded from AWS):

sudo openssl req -x509 -nodes -days 365 -newkey rsa:1024 -keyout mycert.pem -out mycert.pem

This certificate is good for one year. You will be prompted to enter in some personal info for your certificate, but you can type in whatever you want. When you’re done, it should look something like this:

Great, now let’s go back to our home directory by typing “cd” and hitting ENTER.

12. Edit your Jupyter configuration file. Let’s use vim to edit the configuration file we crated in the previous step:

vim .jupyter/jupyter_notebook_config.py

You’ll notice that all the text in the config file is commented out and there’s no real structure beyond what you decide. So really, just paste the following text anywhere you want (remember to use your SHA hash instead of mine).

Hit “i” to enter INSERT mode and type in the configuration or copy and paste it in with COMMAND-c and COMMAND-v. Use the arrow keys to navigate around if you need to. See screenshot below for reference.

c = get_config()

# Kernel config
c.IPKernelApp.pylab = 'inline' # if you want plotting support always in your notebook

# Notebook config
c.NotebookApp.certfile = u'/home/ec2-user/certs/mycert.pem' #location of your certificate file
c.NotebookApp.ip = '*'
c.NotebookApp.open_browser = False #so that the ipython notebook does not opens up a browser by default
c.NotebookApp.password = u'sha1:262....your hash here.........65f' #edit this with the SHA hash that you generated after typing in Step 9
# This is the port we opened in Step 3.
c.NotebookApp.port = 8888

Once you got that in, hit ESC and type “:wq” to save and quit out of vim.

13. Create a folder for your notebooks and start Jupyter Notebook:

mkdir Notebooks
cd Notebooks

Now that you’re in the Notebooks directory you created, you can open Jupyter Notebooks:

jupyter notebook

You should see something like this now:

Woohoo! We got the server running!

14. Access Jupyter Notebooks from your browser. To get there, you’ll need your Public DNS (IPv4) we accessed in Step 5 to get this:

https://ec2-54-144-47-199.compute-1.amazonaws.com:8888/

Make sure you type “https://” before and add “:8888” at the end. This will take you to a warning screen:

Chrome will show you this because it realizes the SSL certificate being used is the one we issued ourself in Step 11. We are aware of this so to continue on we just click the “Advanced” button and then follow the “Proceed to…” link at the bottom.

Next you’ll see the password protection screen for Jupyter. Just type in the actual password you chose in Step 9 (not the SHA hash):

Click “Log In” and you’re ready to create and run your own notebooks on Jupyter!

15. *HIGH FIVE* Congrats, you can now harness the power of AWS to run your Python 3 code!

16. One more thing! You got everything up in just 15 mostly easy steps, but I’m including a 16th step because while working with notebooks, you may want to save your models or objects to access them for later on another machine. Also, you may need to import data from files on your machine to use on your EC2 notebooks. To do this you just need the boto3 package and an S3 bucket. Setting up an S3 bucket is pretty simple and using boto3 is also very straight forward so I will point you to boto3’s documentation for examples: https://boto3.readthedocs.io/en/latest/guide/quickstart.html#installation

And here’s AWS’s documentation for getting started with S3: https://aws.amazon.com/documentation/s3/

Just make sure to have your credentials properly stored. This is documented in the boto3 docs, but you can also install the AWS CLI for easy configuration an access of AWS from the command line: https://aws.amazon.com/cli/

Important housekeeping:

If you don’t know all the INs and OUTs of AWS it’ll be good to know the following info. When you’re done working on your EC2 instance you should stop it to prevent being charged for time not using it. To do this just go to the AWS console then to EC2 and click instances on the sidebar:

Once you stop your instance you’ll no longer be charged for using it with one caveat: all the installations you did and files created will be saved for you since you’re using an instance with EBS storage. AWS charges you for keeping this data for the next time you start up an instance. The cost depends on how much data is saved. Most instances with EBS will store your data on a EBS General Purpose SSD (gp2) Volume which as of June 2017 has a monthly cost of $0.10 per GB. So not bad if you’re only storing code. Go here to check pricing for storing data: https://aws.amazon.com/ebs/pricing/

Let’s say you want to run another instance using the same data/configuration. What you’ll want to do is create an AMI from your instance. Then you can even delete your instances, but still have all the work and files you created saved for future use. This is incredibly easy:

Now under “AMIs” on the side bar, you’ll have everything saved and can launch new EC2 instances directly from the AMI. AMIs can also be saved in S3 for much less money by following these steps.

When you want to start your instance again, just go to the Instance State and select Start. When you start an instance you get a new Public DNS address so make sure to use the new address when you SSH in.

And lastly, when running computationally expense code, you’ll probably want to use a more powerful type of instance than the t.2 micro we started with. Once your instance is stopped you just need to right-click on it and under “Instance Settings” choose “Change Instance Type.” Make sure you’re aware of the costs of using your new instance type (https://aws.amazon.com/ec2/pricing/on-demand/) and that your code makes the most of multi-core processors (i.e. if your code isn’t set to use multi-core functionality in Python, then having 64 cores will run the same as 1 core).

Hope this was useful! Let me know if I made any mistakes or if something something was particularly obscure.