Using Amazon Web Services for Machine Learning with EC2

Published in

Geek Culture

8 min readNov 15, 2021

As a Data Scientist , Analyst or Engineer, we need to run some heavy computational operations on datasets, especially with the prevalence of Big Data and computation requirements for a number of ML algorithms, this can be very difficult to do on your local machine.

In this article we will create a basic Amazon Web Services (AWS) server, also called an EC2 server, to run our python and notebook operations.

The benefits

Always online
Large computational capabilities (server dependent )
Relatively cheap or free (if its your first month using AWS)

The best part of this setup is that this is your server and will be online for as long as you want it to be, however you can run into additional costs if not monitored correctly. This is especially noticeable when using other notebook service platforms such as Google Colab, Kaggle etc. which terminate notebooks after certain times or limit the server type.

AWS Sign-up

We first want to sign-up to AWS with their “free tier” option, you can do this on the AWS Signup page, which will give you 12 months free using specific services.

Click the “Create a Free Account” button

The rest of the signup process is a simple step-by-step process which we wont go over as the page layout change regularly. However the information is usually the same:

Contact information ( Name, surname, email, password etc.)
Billing information ( valid credit/ debit card
ID verification

You wont be charged on your card if you use the free services for a full year.

Once your account has been setup and validated. Sign-in and we can start creating an EC2 server.

Server (EC2) creation

Once logged in, we need to locate the EC2 service. In the top left had corner, click the Services button and look for EC2 (under the Compute section).

EC2 may also be in your recent services list

In the EC2 management page, click the Launch Instances button (top right orange button).

Image Type

Next select your image (server) type, we will use the free tier Linux server, whihc is usually the first on the list. However make sure it has a “free tier eligible” tag under its name.

Instance Type

Tick the smallest server; t2.micro , as this is free tier eligible and click Review and Launch

You can select other service depending on the purpose (general, purpose, memory, storage, performance etc.) or processors types (CPU,GPU). Review all the Instance Types here as well as their pricing.
When clicking Review and Launch we skip the configuration of the EC2 servers security groups, which will keep all the ports open. This would not be the case in a fully, live production environment as we would want to restrict the settings to only allow certain traffic and certain IP address.

Launch

In the Launch screen you can review and edit the server that will be created, such as using larger storage options if you need more hard drive space or a faster SSD

Once you click Launch, you will receive a popup screen to downloads your key pair

Key pair

The key pair is a file that will be used to connect (SSH) into your server.

Select to Create a new key pair and give it a recognizable name

The key pair is provided once and never again so make sure to download and save it in a secure place

Once it is downloaded and safely saved at a location on your pc, click Launch instances.

Monitor Server

The server may take few minutes to launch, and can be monitored by clicking its launch ID or the button at the bottom right of the screen

You can also go to the EC2 services section as shown in the previous steps to view all running servers

Instance address (Public IPv4 DNS)

Click on the InstanceID and copy the Public IPv4 DNS.

Paste this address somewhere as we will use this address later in the article to connect to our server

Connect to Server using Putty (Windows)

We now can connect to our server using any Windows computer and through Putty, a full guide can be found in the AWS documentation. We will quickly go through the step-by-step process below:

Step 1: Download Putty

Go the link above to download and install Putty

Step 2: Convert the key pair file (.pem) to a private file (.ppk) using Putty

In the windows Start menu, search for PuTTYgen, once open, ensure RSA is ticked and click Load

In order to see the key pair file, select All files. Then select the file and click OK

Step 3:Save a new .ppk file.

Click “Save private key”, give it a name and save it at a secure location

NB: Do NOT click Generate

You may Need to move your mouse around the screen to generate randomness for PuttyGen to create the file

You can close PuttyGen

SSH to Server (Windows) with Putty

Launch Putty, expand SSH and select Auth, then Browse for your private key file (.ppk) saved earlier

Once loaded, go back to the Session Section.

In the Host Name field enter username@PublicIPv4DNS

The username is dependent on the instance type. As we are using a AWS Linux instance the username is ec2-user.

ec2-user@<PublicIPv4DNS address>

Remember to give it a name and Save the configuration so you can Load it every time you want to connect

Once done click Open to launch a Putty session to our server

Connect to Server using SSH (Mac)

Alright I’m going to cheat here and only provide the Amazon guide as we don’t have a Mac readily available to test the process.

The AWS guides are usually very detailed however if you have issues please comment with your errors and we will assist as much as possible

Machine Learning Setup

Python — update
pip — install
Jupyter Notebook — install

Python

Luckily enough python3 is already installed on the Amazon Linux instance. However which we can check by using python3 — version

Check python version

Update python (for other instance types refer to this guide)

pip

We can now install pip using

> sudo yum install pip

Type “y” to complete the process

Jupyter Setup

in the command line type

 > pip3 install jupyter

Its good practice to create a virtual environment however as we wont be using this server in production so all we need to do is create a pem file, configure jupyter and run the service (the detailed guide can be found here)

In the Putty window, on the command line, run the below:

Configure the notebook:

jupyter notebook — generate-config

Create a private key (.key) and cert (.pem) file:

> cd ~
> mkdir ssl 
> cd ssl
> openssl req -x509 -nodes -days 365 -newkey rsa:2048 -keyout mykey.key -out mycert.pem

You will be given a few basic questions to answer.

Navigate to the config file folder:

> cd ~/.jupyter/

Edit the config file:

> vi jupyter_notebook_config.py

Press i on to be able to edit the file i.e insert values. Then at the top of the file paste the following:

# Set options for certfile, ip, password, and toggle off
# browser auto-opening
c.NotebookApp.certfile = u'/home/ec2-user/ssl/mycert.pem'
c.NotebookApp.keyfile = u'/home/ec2-user/ssl/mykey.key'# Set ip to '*' to bind on all interfaces (ips) for the public server
c.NotebookApp.ip = '*'# Don't open browser by default
c.NotebookApp.open_browser = False# It is a good idea to set a known, fixed port for server access
c.NotebookApp.port = 8888

Press Esc to stop editing. Then type the below to save and exit the file

:wq

Start Jupyter

To start a the Jupyter service run the following code

> jupyter notebook

The Jupyter notebook service is now running

To access it we go to browser on our LOCAL computer i.e. in chrome or Edge browser on your computer and go to the address

PublicDNSAddress:8888

Where the PublicDNSAddress is the address we saved in previous steps

Well done! you should now have a Jupyter notebook setup to start running some machine learning models

Terminate your Instance (optional)

The last point in the process, is to terminate your server once you are done. This will stop any additional charges to your account (although you can run this for quite awhile before being charged)

In AWS, go to the EC2 service

Look for any running instances, select them and terminate them