PostgreSQL Integration with Python: A Simple Guide
Table of Contents
· Introduction
· What is PostgreSQL?
· Advantages of Using PostgreSQL
· Configuring PostgreSQL on macOS
· Establishing a Database
∘ — Set Password
∘ — Create Database
· Constructing a Database Table
· Connecting to PostgreSQL using Python
∘ — Create Environment Variables
∘ — Create a Python File
∘ — Execute Python File
· Conclusion
Introduction
Effective database storage is critical for the preservation and accessibility of information. The story of databases is closely tied to the evolution of computers. Back before computers existed, people kept their information in things like journals, libraries, and filing cabinets. This made it hard to find what you needed and to make backups. But once the 1960s rolled around and computers became more affordable, we started to see the rise of computerized databases, which made managing data much easier.
PostgreSQL will serve as the database platform utilized in this straightforward guide. I will delineate its advantages, and subsequently demonstrate the process of transferring and retrieving data using Python. But first, let me provide an overview of PostgreSQL.
What is PostgreSQL?
PostgreSQL is an object-relational database management system (ORDBMS) that is available as free and open-source software. It supports both SQL for relational data querying and JSON for non-relational data querying, making it a versatile and powerful choice for managing various types of data. It’s known for its reliability, flexibility, and support of open technical standards.
Advantages of Using PostgreSQL
Here are 5 advantages of using PostgreSQL:
- Open Source: PostgreSQL is free and open-source, making it accessible to all without licensing costs.
- Feature-Rich: It provides a wide array of advanced features, such as support for complex data types like JSONB, arrays, and geometric types, as well as full-text search, Common Table Expressions (CTEs), and window functions.
- Reliability: PostgreSQL provides robust concurrency control and transaction management, ensuring data integrity and reliability, even in highly concurrent environments.
- Community Support: It has a vibrant and active community of users and developers, offering extensive documentation, tutorials, and support resources.
- Scalability: PostgreSQL supports various scaling techniques, allowing it to handle growing data volumes and user loads effectively.
The favorable attributes outlined above played a significant role in my choice to utilize PostgreSQL in a recent Data Engineering project. These advantages make PostgreSQL a compelling choice for organizations seeking a feature-rich, extensible, and scalable database solution that can handle a wide range of workloads and application requirements.
Configuring PostgreSQL on macOS
Install PostgreSQL and start the database server. You can watch the macOS installation video to assist you with the installation. After installing PostgreSQL, you will need to complete the below steps so that the psql
command will work in Terminal.
These steps will add the PostgreSQL binaries path to the paths
file on your computer:
- Open Terminal and type
sudo nano /etc/paths
, then press Enter to open thepaths
file.
Note: You will be prompted to enter a password to edit the paths
file.
- Open the
PostgreSQL
app and make sure the server is running by clicking Start. - Click on Server Settings and copy the binaries path.
- Paste the binaries path in the
paths
file in Terminal.
- Press
Control + O
, then Enter to save the contents to thepaths
file. - Lastly, press
Control + X
to exit thepaths
file. Close Terminal and relaunch it.
Upon relaunching Terminal, you should find that you can now utilize the psql
command. If you do not complete the above steps, then you will get psql: command not found
when trying to execute the psql
command in Terminal.
Establishing a Database
Set Password
We need to set a password for the local database connection which will be used later to communicate to the database using Python.
- Enter
psql -U postgres
in Terminal to launch the PostgreSQL command line interface (CLI) - To set the password, enter
\password postgres
. You’ll be prompted to create a password.
Create Database
To facilitate data transmission to the database via Python, it is necessary to establish a database. In this example, I will create a database named , though you have the liberty to assign any desired name.
- To create the database, type
CREATE DATABASE example_database;
in Terminal and press Enter.
Note: Do forget to add the semicolon at the end.
The database you created should show up in the PostgreSQL app or you can type \l
in Terminal to list all the databases on the server.
Finally, you will connect to the database by entering \c example_database
in Terminal and pressing Enter. If successful, you will see a message saying, “You are now connected to database example_database
as user postgres
“. This step is crucial as we will subsequently proceed to create a table within this database to accommodate incoming data.
Constructing a Database Table
Prior to initiating data insertion into the database, it is imperative to establish a table structure to accommodate the incoming data. Given my passion for video games, I will proceed to design a table specifically tailored to store information regarding my favorite video games.
To create the video_game
table, copy the below code and press Enter.
CREATE TABLE video_game(
game_id INT PRIMARY KEY,
name VARCHAR(100) NOT NULL,
genre VARCHAR(20) NOT NULL,
price FLOAT NOT NULL,
release_date DATE
);
To view the table created, type \d+
and press Enter.
Connecting to PostgreSQL using Python
Create Environment Variables
As a security measure, it is customary to utilize environment variables for accessing sensitive information that one prefers not to disclose explicitly in the code, such as passwords, access keys, server addresses, and similar credentials.
You will need to create a source file named .dbcreds
to add your credentials. Open a new Terminal window and type the below command and press Enter.
cd ~ && touch .dbcreds && nano .dbcreds
The above command will create the .dbcreds
file on your home directory and open the file. You and paste the below to the .dbcreds
file.
# PostgreSQL database credentials
export PG_HOST="localhost"
export PG_PORT="5432"
export PG_USER="postgres"
export PG_DBNAME="example_database"
export PG_PASS="password"
- Press
Control + O
, then Enter to write to the.dbcreds
file. - Lastly, press
Control + X
to exit the.dbcreds
file. Close this Terminal window.
Note: The above information is based upon the setup for this example. You will need to replace the above values with the information for your database.
Create a Python File
Copy the below code to a file and name it postgres_db.py
.
import os
import psycopg2
import pandas as pd
def write_db():
# Create a cursor object to execute queries
cur = conn.cursor()
# Insert data into the database table
cur.execute('INSERT INTO video_game (game_id, name, genre, price, release_date) VALUES (%s, %s, %s, %s, %s)', (1, 'Resident Evil', 'Survival, Adventure', 219.99, '1996-03-22'))
cur.execute('INSERT INTO video_game (game_id, name, genre, price, release_date) VALUES (%s, %s, %s, %s, %s)', (2, 'Mario Kart 64', 'Racing', 32.99, '1997-02-10'))
cur.execute('INSERT INTO video_game (game_id, name, genre, price, release_date) VALUES (%s, %s, %s, %s, %s)', (3, 'GoldenEye 007', 'Shooter', 21.99, '1997-08-23'))
cur.execute('INSERT INTO video_game (game_id, name, genre, price, release_date) VALUES (%s, %s, %s, %s, %s)', (4, 'Sonic the Hedgehog', 'Adventure', 66.30, '1991-06-23'))
cur.execute('INSERT INTO video_game (game_id, name, genre, price, release_date) VALUES (%s, %s, %s, %s, %s)', (5, 'Mortal Kombat', 'Fighting', 28.79, '1992-10-08'))
# Commit the transaction
conn.commit()
# Close the cursor and connection
cur.close()
conn.close()
def read_db():
# Create a cursor object to execute queries
cur = conn.cursor()
# Define schema for the company info DataFrame.
game_schema = {
'game_id': 'int64',
'name': 'str',
'genre': 'str',
'price': 'float64',
'release_date': 'datetime64[ns]',
}
# Set column names for company info
game_cols = ['game_id', 'name', 'genre', 'price', 'release_date']
# Query company info table, fetch all rows from the result set, and create DataFrame
cur.execute('SELECT * FROM video_game')
rows = cur.fetchall()
game_df = pd.DataFrame(rows, columns=game_cols).astype(game_schema)
# Commit the transaction
conn.commit()
# Close the cursor and connection
cur.close()
conn.close()
print(game_df)
if __name__ == '__main__':
# Connect to the the local PostgreSQL database
conn = psycopg2.connect(
dbname = os.environ.get('PG_DBNAME'),
user = os.environ.get('PG_USER'),
password = os.environ.get('PG_PASS'),
host = os.environ.get('PG_HOST'),
port = os.environ.get('PG_PORT'), # Default PostgreSQL port
connect_timeout = 30 # Set timeout to 30 seconds
)
# Write to database
write_db()
# Read from database
read_db()
Execute Python File
Open Terminal and navigate to the directory of the postgres_db.py
file. Execute the below commands to write the data to the PostgreSQL database then read the data the was written.
# Get database credentials
# Write and read from database
# Display database contents
source ~/.dbcreds && python postgres_db.py
If successful, you will see the contents of the database table.
The contents are also visible from the PostgresSQL CLI. Open a new Terminal and type psql -U postgres
and press Enter. Finally, type SELECT * FROM video_game;
and press Enter to display all the contents in the table.
Conclusion
At this point, you should now possess a simple understanding of PostgreSQL, encompassing its definition, advantages, and practical implementation through Python. Should you require further clarification or assistance, please feel free to reach out, as I am readily available to provide guidance and support.
Should you wish to connect, feel free to visit my LinkedIn profile. I’m eager to engage with fellow enthusiasts within the dynamic realm of Data Engineering. Thank you for your time, and I anticipate interacting with you in the comments section.