Using RDS on AWS with Jupyter Notebooks
Creating, Connecting, and Querying a PostgreSQL Database on AWS
Amazon Web Services is a widely used cloud-computing platform, that offers a free tier of many of their services, so newbies like me can play around with the functionality in the hopes of making myself more marketable, and eventually getting a job!
That said, AWS can look overwhelming to a beginner, since they offer so much. Today I’m going to focus on their Relational Database Services (RDS) — creating and populating a PostgreSQL table on AWS. I’m assuming you already have an AWS account. If you don’t, you can follow the instructions from AWS, here.
Setting Up the Database
Getting started with RDS is the easy part. On your AWS Homepage, scroll down to “Database” and click RDS
Once on the Amazon RDS Dashboard, scroll down to Create database section and click the orange button — “Create database”
Note that mine is set up to launch in Ohio. You can change your location (best to chose the closest location) in the navigation bar on the top left corner of the page.
Next comes a page with a bunch of options. It’s probably fine to keep most of these as the default, but there are some you should make sure are filled out.
We’re setting up a PostgreSQL DB, so we’ll pick PostgreSQL.
I’m going to stick with “Free tier” for this one.
Create a name for your DB instance, and come up with credentials. I got real creative with mine.
I’m making my database publicly accessible so it’s accessible outside of the default VPC.
Give your database a name!
Again, all other options were just left to their default settings.
Make sure your estimated monthly costs are still free, and click “Create database”
Connecting to the Database from a Jupyter Notebook
Ok, now that we have a database instance setup, and a database ready to store some data on the cloud, we can start populating our database with tables. However, we need a way to connect to our database instance from our computers. The simplest way I found to do this with Python was by downloading psycopg2. I was having trouble with the pip install for this package, so I used the binary version —
pip install psycopg2-binary, and that worked just fine for my purposes.
Setting up your config file
Once you have psycopg2, you should create a
config.py file to store the details for accessing your database. To connect, we’ll need the database:
- User’s Name
- User’s Password
So, with all of that in mind, my config.py file looks something like this:
All of these details can be found on your AWS page!
Navigate to the RDS Dashboard, then to Databases. You should see your new database instance listed, and available to click on. Clicking will bring you to a page with details about the database instance.
Once you click, you’ll first see a summary of your database instance. Scrolling just beneath the summary, you’ll see a number of tabs listed horizontally. Starting with the “Connectivity and security” tab, you’ll be able to find your endpoint and port.
Then checkout the “Configuration” tab for the rest of your details: DB name & Master username.
Hopefully you remember your password that you chose when you were configuring your database. That’s the password you’ll put in your
Connecting & Creating a table
After importing psycopg2 and my config file, I typically create a function that connects to the database and sets up a cursor for me as well.
This function uses our credentials from our
config.py file to create the
conn_string, and uses the
conn_string to create the connection to our database hosted by AWS.
Once we have a connection and a cursor, we can start writing our SQL queries in Python as we normally would. PostgreSQL queries from Python using the psycopg2 library need four elements:
- Establish Connection
- Establish Cursor
- Execute Cursor
- Commit Connection
If you’re new to SQL and relational databases (or even if you aren’t) I’d recommend using some sort of database manager — something that will allow you to visualize your database and tables as you create and query them. This will help you to double check that your connection and queries are working as expected. I’ve been using TablePlus.
Using the same connection details that we used for our
config.py files, we’ll connect TablePlus to our database on AWS. Once TablePlus connects to our brand new & empty database, it’ll look like this:
Then, we can start creating tables with the 4 elements above:
After running these few lines of code, we can see in TablePlus that we’ve created an “inventory” table in our test_db on our med-blog-db instance:
The table can be populated and queried similarly, using the four elements.
That’s a wrap
Using AWS free tier is a great way to get some experience with the platform. There are so many different ways it can be used, and today we covered one way — hosting a database on AWS and connecting to it using the pyscopg2 library for Python.
I hope you found this guide helpful, and please do reach out if you have any questions, comments, or constructive criticisms!