Using RDS on AWS with Jupyter Notebooks

Creating, Connecting, and Querying a PostgreSQL Database on AWS

Samantha Jackson
Aug 12 · 5 min read

Amazon Web Services is a widely used cloud-computing platform, that offers a free tier of many of their services, so newbies like me can play around with the functionality in the hopes of making myself more marketable, and eventually getting a job!

That said, AWS can look overwhelming to a beginner, since they offer so much. Today I’m going to focus on their Relational Database Services (RDS) — creating and populating a PostgreSQL table on AWS. I’m assuming you already have an AWS account. If you don’t, you can follow the instructions from AWS, here.

Setting Up the Database

Getting started with RDS is the easy part. On your AWS Homepage, scroll down to “Database” and click RDS

Once on the Amazon RDS Dashboard, scroll down to Create database section and click the orange button — “Create database”

Note that mine is set up to launch in Ohio. You can change your location (best to chose the closest location) in the navigation bar on the top left corner of the page.

Next comes a page with a bunch of options. It’s probably fine to keep most of these as the default, but there are some you should make sure are filled out.

Select Engine

We’re setting up a PostgreSQL DB, so we’ll pick PostgreSQL.

Select Template

I’m going to stick with “Free tier” for this one.

Settings

Create a name for your DB instance, and come up with credentials. I got real creative with mine.

Don’t forget your Master UN or Password! You’ll need them!

Connectivity

I’m making my database publicly accessible so it’s accessible outside of the default VPC.

Additional Configuration

Give your database a name!

Again, all other options were just left to their default settings.

Finally

Make sure your estimated monthly costs are still free, and click “Create database”

Connecting to the Database from a Jupyter Notebook

Ok, now that we have a database instance setup, and a database ready to store some data on the cloud, we can start populating our database with tables. However, we need a way to connect to our database instance from our computers. The simplest way I found to do this with Python was by downloading psycopg2. I was having trouble with the pip install for this package, so I used the binary version — pip install psycopg2-binary, and that worked just fine for my purposes.

Setting up your config file

Once you have psycopg2, you should create a config.py file to store the details for accessing your database. To connect, we’ll need the database:

  1. Endpoint
  2. Port
  3. Name
  4. User’s Name
  5. User’s Password

So, with all of that in mind, my config.py file looks something like this:

All of these details can be found on your AWS page!

Navigate to the RDS Dashboard, then to Databases. You should see your new database instance listed, and available to click on. Clicking will bring you to a page with details about the database instance.

Click on your new db instance

Once you click, you’ll first see a summary of your database instance. Scrolling just beneath the summary, you’ll see a number of tabs listed horizontally. Starting with the “Connectivity and security” tab, you’ll be able to find your endpoint and port.

Then checkout the “Configuration” tab for the rest of your details: DB name & Master username.

Hopefully you remember your password that you chose when you were configuring your database. That’s the password you’ll put in your config.py file.

Connecting & Creating a table

After importing psycopg2 and my config file, I typically create a function that connects to the database and sets up a cursor for me as well.

This function uses our credentials from our config.py file to create the conn_string, and uses the conn_string to create the connection to our database hosted by AWS.

Once we have a connection and a cursor, we can start writing our SQL queries in Python as we normally would. PostgreSQL queries from Python using the psycopg2 library need four elements:

  1. Establish Connection
  2. Establish Cursor
  3. Execute Cursor
  4. Commit Connection

If you’re new to SQL and relational databases (or even if you aren’t) I’d recommend using some sort of database manager — something that will allow you to visualize your database and tables as you create and query them. This will help you to double check that your connection and queries are working as expected. I’ve been using TablePlus.

Using the same connection details that we used for our config.py files, we’ll connect TablePlus to our database on AWS. Once TablePlus connects to our brand new & empty database, it’ll look like this:

Whole lot of nothing

Then, we can start creating tables with the 4 elements above:

After running these few lines of code, we can see in TablePlus that we’ve created an “inventory” table in our test_db on our med-blog-db instance:

Now we’ve added the inventory table

The table can be populated and queried similarly, using the four elements.

That’s a wrap

Using AWS free tier is a great way to get some experience with the platform. There are so many different ways it can be used, and today we covered one way — hosting a database on AWS and connecting to it using the pyscopg2 library for Python.

I hope you found this guide helpful, and please do reach out if you have any questions, comments, or constructive criticisms!

Samantha Jackson

Written by

Data Scientist // Flatiron School Alumni

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade