If your database is not directly accessible from outside the environment then you may need an SSH tunnel to query it. In this post I am going to show you how to connect to and query a MySQL database over SSH into a Pandas dataframe. The same code can be applied to connect to other databases such as PostgreSQL.
Let’s say your database is hosted on 52.xx.xx.xx and you have the following user and private key:
# ssh variables
host = '52.xx.xx.xx'
localhost = '127.0.0.1'
ssh_username = 'ubuntu'
ssh_private_key = '/path/to/key.pem'
And once we SSH into the environment we need a username and password to connect to a database called “database”:
# database variables
To turn all of this into one function that you can call from your Python code we can use the sshtunnel package which you can get using pip:
pip install sshtunnel
Then import SSHTunnelForwarder, MySQLdb, as well as pandas:
from sshtunnel import SSHTunnelForwarder
import MySQLdb as db
import pandas as pd
Now we’ll use SSHTunnelForwarder to create a tunnel, connect to the database, do the query and return the results as a Pandas dataframe all in one shot. The connection and the SSH tunnel is closed after the query.
) as server:
conn = db.connect(host=localhost,
db=database) return pd.read_sql_query(q, conn)
read_sql_query conveniently takes your SQL query as a string, as well as the connection that was just established to the database, and creates a dataframe out of the tabular data.
Now you can simply query your database:
df = query('select * from table limit 1000')
Here’s the Jupyter notebook.