Part 5.2 !! Pandas DataFrame to PostgreSQL using Python
Extract, Transform, and Load PostgreSQL Data in Python
Prerequisites
Python 3.8.3 : Anaconda download link
PostgreSQL 13 : Download link
Psycopg2 : To install Psycopg2 use the command: pip install psycopg2, petl
Objective
Generally we create training and testing data by importing csv file into pandas DataFrame but when we have large data stored in database server then we need a method to extract it into pandas DataFrame directly from database server. For this objective, we will learn step by step working code for different methods for doing it.
Step 1 : Import libraries
# import sys to get more detailed Python exception info
import sys
# import the connect library for psycopg2
import psycopg2
# import the error handling libraries for psycopg2
from psycopg2 import OperationalError, errorcodes, errors
import psycopg2.extras as extras
from sqlalchemy import create_engine
import pandas as pd# Extract Trabsforl & Load
import petl as etl
Step 2 : Specify the connection parameters
# Note: please change your database, username & password as per your own values
conn_params_dic = {
"host" : "localhost",
"database" : "irisdb",
"user" : "postgres",
"password" : "Passw0rd"
}
Step 3 : Support Functions
# Define a function that handles and parses psycopg2 exceptions
def show_psycopg2_exception(err):
# get details about the exception
err_type, err_obj, traceback = sys.exc_info()
# get the line number when exception occured
line_n = traceback.tb_lineno
# print the connect() error
print ("\npsycopg2 ERROR:", err, "on line number:", line_n)
print ("psycopg2 traceback:", traceback, "-- type:", err_type)
# psycopg2 extensions.Diagnostics object attribute
print ("\nextensions.Diagnostics:", err.diag)
# print the pgcode and pgerror exceptions
print ("pgerror:", err.pgerror)
print ("pgcode:", err.pgcode, "\n")# Define a connect function for PostgreSQL database server
def connect(conn_params_dic):
conn = None
try:
print('Connecting to the PostgreSQL...........')
conn = psycopg2.connect(**conn_params_dic)
print("Connection successfully..................")
except OperationalError as err:
# passing exception to function
show_psycopg2_exception(err)
# set the connection to 'None' in case of error
conn = None
return conn
Step 4: Loading PostgreSQL Data into a CSV File : Extract, Transform, and Load the PostgreSQL Data
We can use petl to extract, transform, and load the PostgreSQL data. In this example, we extract PostgreSQL data, sort the data by the species column, and load the data into a CSV file.
# Connecting to PostgreSQL Data
conn = connect(conn_params_dic)# Create a SQL Statement to Query PostgreSQL
#sql = "SELECT * FROM iris WHERE species = 'testing'"
sql = "SELECT * FROM iris "extractData = etl.fromdb(conn,sql)
extractData.head()
Transform
Here, you can perform different opations with data after loading. For example, you can do the following
- join tables,
- sorting,
- fill missing values
- Reshaping tables etc.
transformData = etl.sort(extractData,'species')
Load (Write data)
After all operatons with data, now can save your data in you disk for reporting or later use.
etl.tocsv(transformData,'../Learn Python Data Access/iris_v1.csv')
Conclusion : This ends our Part 5.2. In this tutorial we have learned how to extract, transform and load data.
All code for this article is available as a Jupyter Notebook on GitHub.
Previous Learning :
Part 1: Introduction , Connection & Database Creation
Part 2 Create Table in PostgreSQL Database Using Python
Part 3.1: Insert Bulk Data Using executemany() Into PostgreSQL Database
Part 3.2: Insert Bulk Data Using execute_batch() Into PostgreSQL Database
Part 3.3: Insert Bulk Data Using execute_values() method into PostgreSQL Database using Python
Part 3.4: Insert Bulk Data Using mogrify() Into PostgreSQL Database
Part 3.5: Insert Bulk Data Using copy_from() Into PostgreSQL Database
Part 3.6 : Insert Bulk Data Using copy_from() with StringIO Into PostgreSQL Database
Part 3.7 : Insert Bulk Data Using to_sql()(alchemy) Into PostgreSQL Database
Part 4: Comparison of Methods for Importing bulk CSV data Into PostgreSQL Using Python
Part 5.1 : How to read data from PostgreSQL to Pandas DataFrame?
Reference: petl : petl — Extract, Transform and Load (Tables of Data) — petl 1.6.8 documentation
Stay Positive !! Stay Safe !! Keep Learning :):)
Thanks you for reading! !