Retrieving data from Amazon Relational Database Service (RDS) — PostgreSQL

Nukky
2 min readJun 23, 2020

--

The aim of this article is to demonstrate how to connect to PostgreSQL and convert the SQL tables into DataFrames in Python.

I have encountered a few data scientist interview tests that require me to connect to AWS relational database in order to retrieve data as the starting point of the tests, so I felt I should share my experience on Medium.com. Without further ado, let’s jump into it!

STEP 1: The prerequisite packages are Psycopg 2 and Pandas, and you can install these two Python packages by pip install:

pip install psycopg2
pip install pandas

STEP 2: After installing the required packages, we can import these two packages:

import psycopg2
import pandas as pd

STEP 3: Now we are ready to create an engine to connect with PostgreSQL database in AWS:

engine = psycopg2.connect(
host = "xxxx.rds.amazonaws.com",
database = "NAME_OF_DATABASE",
user = "guest",
password = "PASSWORD",
port = "5432" )

Usually the above 5 parameters will be given by the SQL database owners, and here are my experiences:

  • The host is the host address for the database and it should usually look like a URL, if it’s a RDS from AWS it should end with something like “rds.amazonaws.com”.
  • The database is the name of the database, and should be given to you by the database owner.
  • The user is the user name for accessing the database, which should be created and provided by the database owner.
  • The password is the password for accessing the database, which should be also created and provided by the database owner.
  • You can always assume the default port is “5432”, unless provided otherwise.

STEP 4: The connection is established in step 3, and now we can retrieve data from AWS:

cur = engine.cursor()cur.execute("select * from TABLE_NAME")
data= cur.fetchall()
data_df = pd.DataFrame(data)
engine.close()

Here a cursor was created first, then you simply execute a SQL query, such as “select * from TABLE_NAME” to get the desired data from the database. Thirdly, use pd.DataFrame method to convert data into a DataFrame. Finally close the engine, as you don’t want to send unwanted queries to the database by accident.

Viola! I hope you can now use this article to help you to retrieve data from PostgreSQL database from AWS RDS.

--

--