PostgreSQL Integration with Jupyter Notebook

Published in

Analytics Vidhya

3 min readNov 3, 2019

For a more seamless data extraction experience.

Jupyter Notebook is a great software to catalog steps taken during data visualization and analysis. It supports Python code, markdown, HTML, and thanks to a few libraries, PostgreSQL! This allows you to write SQL code right in the notebook and make changes, or query the database from there. You can also convert any query into a pandas dataframe so you can go straight to the analysis. It has made me love using SQL and Jupyter Notebook more!

Prerequisites:

Jupyter Notebook
PostgreSQL and a database (sample database & how to load)

Libraries to install:

ipython-sql
sqlalchemy
A python database API (DBAPI) library

The first two libraries can be installed by using the pip install commands:

pip install ipython-sql
pip install sqlalchemy

The third library depends on what SQL software you choose to use. For PostgreSQL, you would use psycopg2:

pip install psycopg2

Different DBAPIs are documented on sqlalchemy’s website, like MySQL, Oracle, and SQLite (thanks Dave Campbell).

ipython-sql

Created by catherinedevlin on Github, this enables the use of SQL magic functions that contain % and %% , allowing you to write SQL style code right in Jupyter Notebook.

sqlalchemy

Originally authored by Michael Bayer, sqlalchemy is touted as a “SQL toolkit and object-relational-mapper” for Python. For the purposes of this tutorial, it will mainly be used to store SQL queries into a pandas dataframe.

Now that we have our libraries installed, let’s move on to Jupyter Notebook!

Getting Started

To load ipython-sql, use the following magic command:

%load_ext sql

Next, we will only need the create_engine() function from sqlalchemy so let’s import that with the following line:

from sqlalchemy import create_engine

Once we’ve laid the groundwork, we can now connect to a PostgreSQL database!

Connecting to a PostgreSQL database

To connect ipython-sql to your database, use the following format:

# Format
%sql dialect+driver://username:password@host:port/database# Example format
%sql postgresql://postgres:password123@localhost/dvdrental

dialect+driver in this case would just be postgresql , but feel free to use a different database software here.
username:password is where you will substitute your username and password.
host is usually just localhost.
In my experience, port does not need to be specified.
database is the name of the database to connect to.

To see if it successfully connected, Jupyter Notebook will print the following:

'Connected: username@database'

To connect sqlalchemy to the database, the format will largely be the same, but we will create a new object called engine using the create_engine() function:

# Format
engine = create_engine('dialect+driver://username:password@host:port/database')# Example format
engine = create_engine('postgresql://postgres:password123@localhost/dvdrental')

For the most part, you can just copy & paste what you wrote after the %sql magic command when connecting to ipython-sql and encapsulate it in quotes as a parameter in the create_engine() function.

As long as you see no errors in the output, you should be connected to the database!

Writing SQL Commands in Jupyter Notebook

To enable database querying and other commands, call the magic command %%sql and add your SQL code after. The SQL code should be in its own block, separate from Python code (Thanks Daniel Upton for pointing that out!). Here’s sample code that queries the first three rows and all of the columns in the inventory table:

If you’d like to store your query in a pandas DataFrame, this is where sqlalchemy comes in. Create a dataframe object using the command pd.read_sql() . It takes two arguments:

Your SQL query encapsulated in quotes.
The engine object you created earlier using the create_engine() function.

The following is an example of how to implement a dataframe, inventory_dataframe , using the inventory table query SQL code in the image above.

Once you have your query in a DataFrame, feel free to continue with your data analysis!