Bringing native psql magic into Jupyter Notebook
How to get native psql productivity into the Jupyter Notebook environment
Jupyter Notebook for data access, manipulation, and analysis
The Jupyter Notebook is a web-based interactive environment for creating and sharing documents that contain live code, equations, visualizations, and narrative text. It’s an incredibly powerful tool for interactively developing and presenting data science projects, helps streamline work from codes to full-blown reports and enables more productivity and easy collaboration. Due to these reasons, Jupyter Notebook has become one of the most popular tools within the data community.
To get the most out of Jupyter notebooks, one should be familiar with programming — most typically Python (although coding in few other languages is allowed). In particular, the pandas Python library offers data structures and operations for manipulating and analyzing numerical tables and time series.
For those who prefer using SQL for their data projects, Jupyter Notebook allows few different ways to connect to JDBC/ODBC data sources and manipulate data — my personal favorite is ipython-sql which introduces %sql (or %%sql) magic into Jupyter Notebook.
Jupyter Notebook for PostgreSQL users
The primary front-end for PostgreSQL users is the psql
command-line program, which can be used to enter SQL queries directly or execute them from a file. In addition, psql
provides a number of meta-commands and various shell-like features to facilitate writing scripts and automating a wide variety of tasks.
Jupyter Notebook %sql and %%sql magic arrive with some PostgreSQL feature packaged, including psql
-style "backslash" meta-commands (\d
, \dt
, etc.) provided by PGSpecial but this does not cover for the complete list of available meta-commands. An alternative solution to cover for these cases is to run the native psql client command from within the Jupyter Notebook environment and capturing the output back to the environment.
Here is how to achieve this:
Run native psql meta-commands for Jupyter Notebook
- Ensure postgresql-client library is installed:
$ sudo apt-get install postgresql-client
- Ensure psql is installed:
$ psql --version
psql (PostgreSQL) 10.6 (Ubuntu 10.6-0ubuntu0.18.04.1)
- Create a .pgpass Password file:
Create .pgpass
file in a user's home directory as described in PostgreSQL documentation. This file contains passwords to be used if the connection requires a password (or no password has been specified otherwise) and should contain lines of the following format:
hostname:port:database:username:password
- Start a new Jupyter Notebook:
- Add the following code into the Jupyter Notebook cell:
Note: make sure to update the connection parameters (host, database, username) to match your PostgreSQL database.
When prompted by the environment, give your psql
command, the same way as if you would execute psql
from the terminal, i.e. \dt+
Note: to guarantee the proper display of psql
output, the code above is forcing the -H|--html
parameter, which turns on HTML tabular output.
- Execute the code within the cell (⌃↩):
Notice how the HTML output of psql is rendered within the Jupyter Notebook.
More examples: execute psql \copy meta-command
The psql /copy
meta-command performs a frontend (client) copy. This operation essentially runs an SQL COPY command but instead of the server reading or writing the specified file, psql reads or writes the file and routes the data between the server and the local file system. This means that file accessibility and privileges are those of the local user, not the server, and no SQL superuser privileges are required. psql /copy ... from stdin | to stdout
reads/writes based on the command input and output respectively. All rows are read from the same source that issued the command, continuing until \. is read or the stream reaches EOF.
Note: psql /copy
operation is not as efficient as the SQL COPY command because all data must pass through the client/server connection, yet this option is useful for i.e. populating tables in-line with a SQL script file.
- Export table data into a CSV file in the local file system:
- Create a new table & load it with data from a local system CSV file:
Let me know if the steps described above work for you — your feedback would be very much appreciated. Feel free to respond below, reach out to me on Twitter @cantzakas or connect with me on LinkedIn.