Bringing native psql magic into Jupyter Notebook

How to get native psql productivity into the Jupyter Notebook environment

Constantinos Antzakas
Greenplum Data Clinics
4 min readJan 10, 2019

--

Jupyter Notebook for data access, manipulation, and analysis

The Jupyter Notebook is a web-based interactive environment for creating and sharing documents that contain live code, equations, visualizations, and narrative text. It’s an incredibly powerful tool for interactively developing and presenting data science projects, helps streamline work from codes to full-blown reports and enables more productivity and easy collaboration. Due to these reasons, Jupyter Notebook has become one of the most popular tools within the data community.

Jupyter Notebook example

To get the most out of Jupyter notebooks, one should be familiar with programming — most typically Python (although coding in few other languages is allowed). In particular, the pandas Python library offers data structures and operations for manipulating and analyzing numerical tables and time series.

For those who prefer using SQL for their data projects, Jupyter Notebook allows few different ways to connect to JDBC/ODBC data sources and manipulate data — my personal favorite is ipython-sql which introduces %sql (or %%sql) magic into Jupyter Notebook.

%sql, %%sql magic

Jupyter Notebook for PostgreSQL users

The primary front-end for PostgreSQL users is the psql command-line program, which can be used to enter SQL queries directly or execute them from a file. In addition, psql provides a number of meta-commands and various shell-like features to facilitate writing scripts and automating a wide variety of tasks.

Jupyter Notebook %sql and %%sql magic arrive with some PostgreSQL feature packaged, including psql-style "backslash" meta-commands (\d, \dt, etc.) provided by PGSpecial but this does not cover for the complete list of available meta-commands. An alternative solution to cover for these cases is to run the native psql client command from within the Jupyter Notebook environment and capturing the output back to the environment.

Here is how to achieve this:

Run native psql meta-commands for Jupyter Notebook

  • Ensure postgresql-client library is installed:
  • Ensure psql is installed:
psql version
  • Create a .pgpass Password file:

Create .pgpass file in a user's home directory as described in PostgreSQL documentation. This file contains passwords to be used if the connection requires a password (or no password has been specified otherwise) and should contain lines of the following format:

  • Start a new Jupyter Notebook:
Create a new Jupyter Notebook
  • Add the following code into the Jupyter Notebook cell:

Note: make sure to update the connection parameters (host, database, username) to match your PostgreSQL database.

When prompted by the environment, give your psql command, the same way as if you would execute psql from the terminal, i.e. \dt+

Note: to guarantee the proper display of psql output, the code above is forcing the -H|--html parameter, which turns on HTML tabular output.

psql “magic” example
  • Execute the code within the cell (⌃↩):
psql “magic” example result

Notice how the HTML output of psql is rendered within the Jupyter Notebook.

More examples: execute psql \copy meta-command

The psql /copy meta-command performs a frontend (client) copy. This operation essentially runs an SQL COPY command but instead of the server reading or writing the specified file, psql reads or writes the file and routes the data between the server and the local file system. This means that file accessibility and privileges are those of the local user, not the server, and no SQL superuser privileges are required. psql /copy ... from stdin | to stdout reads/writes based on the command input and output respectively. All rows are read from the same source that issued the command, continuing until \. is read or the stream reaches EOF.

Note: psql /copy operation is not as efficient as the SQL COPY command because all data must pass through the client/server connection, yet this option is useful for i.e. populating tables in-line with a SQL script file.

  • Export table data into a CSV file in the local file system:
  • Create a new table & load it with data from a local system CSV file:

Let me know if the steps described above work for you — your feedback would be very much appreciated. Feel free to respond below, reach out to me on Twitter @cantzakas or connect with me on LinkedIn.

--

--

Constantinos Antzakas
Greenplum Data Clinics

Enabling teams to innovate and succeed. Former national 🇬🇷 swimming champion, aspiring Olympic Weightlifter (96kg) and LGBTQ+ in tech advocate.