Query your database over an SSH tunnel with Pandas

If your database is not directly accessible from outside the environment then you may need an SSH tunnel to query it. In this post I am going to show you how to connect to and query a MySQL database over SSH into a Pandas dataframe. The same code can be applied to connect to other databases such as PostgreSQL.

Let’s say your database is hoted on 52.xx.xx.xx and you have the following user and private key:

# ssh variables
host = '52.xx.xx.xx'
localhost = ''
ssh_username = 'ubuntu'
ssh_private_key = '/path/to/key.pem'

And once we SSH into the environment we need a username and password to connect to a database called “database”:

# database variables

To turn all of this into one function that you can call from your Python code we can use the sshtunnel package which you can get using pip:

pip install sshtunnel

Then import SSHTunnelForwarder, MySQLdb, as well as pandas:

from sshtunnel import SSHTunnelForwarder
import MySQLdb as db
import pandas as pd

To import MySQLdb you need to have MySQL-python installed first. You can substitute Psycopg for MySQLdb to do the same thing for PostgreSQL.

Now we’ll use SSHTunnelForwarder to create a tunnel, connect to the database, do the query and return the results as a Pandas dataframe all in one shot. The connection and the SSH tunnel is closed after the query.

def query(q):
with SSHTunnelForwarder(
(host, 22),
remote_bind_address=(localhost, 3306)
) as server:
conn = db.connect(host=localhost,
          return pd.read_sql_query(q, conn)

read_sql_query conveniently takes your SQL query as a string, as well as the connection that was just established to the database, and creates a dataframe out of the tabular data.

Now you can simply query your database:

df = query('select * from table limit 1000')

Here’s the Jupyter notebook.

Show your support

Clapping shows how much you appreciated Amir Ziai’s story.