Set up AWS EC2 for Deep Learning in 17 minutes
In this post I will give a step by step explanation of how to setup an Amazon EC2 cloud instance for deep learning. When I first used an EC2 instance, it was a real pain getting the cloud environment going on a windows machine and I spent countless hours googling and patching information from all over the internet before I could train an image recognition model. This post assumes you are on a Windows machine but the steps are almost the same albeit easier on Linux. These steps also run through the default settings that are more than enough to get you up and running in no time.
Deep Learning is a sub-field of machine learning concerned with algorithms inspired by the structure and function of the brain called artificial neural networks. Deep learning has been gaining momentum lately especially ever since we learned how to utilize GPUs in place of CPUs.
CPUs alone are not enough. They can do the processing, but the sheer volume of unstructured data that needs to be analysed to build and train deep learning models can leave them maxed out for weeks on end. Even multi-core CPUs struggle with deep learning, which is where the GPU (Graphics Processing Unit) comes in.
GPUs are specialized processors developed originally to handle complex image processing but they are particularly adept at processing matrices which is something CPUs have trouble with and it’s this that suits them to specialized applications like deep learning. Also, a lot more specialized GPU cores can be crammed into the circuits than with a CPU.
Still, they are expensive to purchase and a decent setup will cost you an arm and a leg that it makes developing deep learning models inaccessible to most individuals without the backing of an organization. This is where cloud development comes in. As of today, there are many vendors that will let you rent GPUs for the duration of your research. This include Amazon EC2, Microsoft Azure and Google cloud instances.
Step By Step Instructions
- Go to signin.aws.amazon.com and login or sign in to your Amazon AWS account.
- After you are logged in you’ll end up in the AWS services page. Then find the “Build a Solution” section which is the second one from the top.
- Click on the “Launch a virtual machine” link
- This takes you to another page where you can select a machine Image or build your own. If you need a custom solution you should select an architecture like Linux or Windows and do a fresh install of the libraries you need. If you don’t need a custom solution you can use one of the many available images that come pre-installed with Artificial Intelligence and Machine Learning libraries from different vendors including Amazon. Lets go with the Deep Learning AMI (Ubuntu) Version 12.0 which you can locate by scrolling down.
- Click on the Select button to the right to choose the pre-configured image.
- Clicking the select button takes you to the next page that lets you choose how many GPUs or CPUs for that matter (we are only interested in GPU) you would like on your setup. Keep in mind that the more GPUs you pick the more expensive it is going to be to run it. For a deep learning model we need at least the p2.xlarge configuration. To get there click on the drop down that says All Instance Types and select GPU Compute.
- Next click on the bottom right and configure your instance. Click next on the next page to accept default settings and click bottom left to configure your storage options.
- Select the default storage or configure your own setup.
- Next configure the Security Group. Do not skip this step as this is where you allow access to your cloud through http. Enter the following information in the security group page and save your settings to a custom name.
- Next review your settings and launch your instance.
- Next create a Key Pair which will be used to authenticate you when you SSH into the instance. Then click on the bottom right to launch the instance. A key pair consists of a public key that AWS stores and a private key file that you store (downloaded as PEM file). PEM stands for Privacy Enhanced Mail and is a widely used X.509 encoding format used for security certificates. Together, the two keys enable you to securely connect to your EC2 instance using SSH.
- The next page is then a confirmation that your instance is up and running.
- You can now view you running instance on the dash board.
SSH into the EC2 Cloud Instance from a Windows Machine
Windows does not support SSH in the older versions but this work around using Putty should do the trick.
Download and Install Putty
- If you don’t have the PuTTY software installed on your system, you will need to download it from www.putty.org.
- Download your EC2 Key Pair from the steps above and convert your key to PPK format. PuTTY does not support the PEM format that AWS uses, so you need to first convert your PEM file to a PPK file. To do this, you use the PuTTYgen utility. To start the utility you can type puttygen in the Windows start dialog box:
- On the PuTTYgen dialog box, click the Load Button and then select the .pem file that you downloaded from AWS. Note: when browsing for your pem file be sure to select All Files in the dropdown list that is located to the right of the File name field. PuTTYgen will then load and convert your file.
- As the message indicates, you then need to click on “Save private key”. You will receive a warning message asking if you want to save this key without a passphrase. Be sure to select Yes. Provide a name for your ppk file and click save.
Launch PuTTY
- Now that you have converted the pem file to a ppk file, you are ready to use the PuTTY utility. In the Windows start dialog box, type in putty to start the utility.
- Enter your host name in the format of: user_name@public_dns_name. Make sure you use the username that matches the Image name. For our Ubuntu Linux AMI, the user name is ubuntu.
- Next get the public DNS from the dashboard and append it to the user name above.
- Next, click on the + button next to the SSH field to expand this section. Then click on Auth (which stands for authenticate) and enter the name of your private key file (i.e. the ppk file) where it says Private key file for authentication
- Lastly, click on Open to start your SSH session. Note: if this is the first time that you are logging into the instance, you will receive the following alert. Click on Yes to continue. If you did everything correctly, you will see a new window appear displaying your command line SSH session.
Serving Jupyter Notebook from EC2 instance
- Since our image has pre-installed libraries for machine learning we can get to the Jupyter Notebook by typing iPython in the Shell.
jupyter notebook --generate-configmkdir certscd certssudo openssl req -x509 -nodes -days 365 -newkey rsa:1024 -keyout mycert.pem -out mycert.pemcd ~/.jupyter/nano jupyter_notebook_config.py
- This will open up the python file for editing.
- Exit out of the .py file and type the following into the terminal.
jupyter notebook
- The above command serves a jupyter notebook server which you can now run to do some Machine Learning from the browser.
- To access your server, substitute the url from the AWS dashboard.
https://ip-address:port/// for our case this will be:https://18.222.255.167:8888/?token=8f71a160cc33cf5fcc1f17ad59698481272a3bc7655ba495
- The first time you get to the server you will get the above error but you can ignore it knowing that you served the server. Click where shown by the arrows.
- You are now connected to the server as shown above.
- You can see the Anaconda installation and start a new Notebook with a library of your choosing.
- To get started with a project off GitHub use wget to import from URL and you are good to go.
- Now we can open the project from the Jupyter Notebook.
I hope this post gave you detailed enough step by step instructions to setup an AWS EC2 instance and train some models. If you have questions, anything to add or some feedback please post in the comments section.