Mini-tutorial: Reading SQL into Pandas
At least for the moment, tutorials and examples are plentiful for capturing a csv
into a pandas DataFrame, then into a SQL table. Here’s a very quick reference/tutorial on the reverse using pandas.read_sql
, pandas.read_sql_table
, and pandas.read_sql_query
.
The official pandas
documentation gives plenty of examples of reading data from a csv
, json
, or filetypes to be loaded into Python memory as a pandas
DataFrame
object. These DataFrames
are often then stored into sql using to_sql
. Examples of the reverse — getting data from SQL into a pandas.DataFrame
to a csv
weren’t as abundant.
We’ll go over the similarities and differences of pandas.read_sql
, pandas.read_sql_table
, and pandas.read_sql_query
usage.
Pandas is commonly used with another Python library named sqlalchemy
, a library that provides a number of methods to more easily interact with SQL.
The following is an example of using sqlalchemy
from within Python for loading query data from postgresql into a pandas.DataFrame
. Interpolating variables in the connection string that are unique to your individual database configurations. The variable table_name
is something you should define on your own as well.
import sqlalchemy as sqla
import pandas as pdconnection_string = f'{db_driver_protocol}://{db_user}:{db_password}@{db_host}:{db_port}/{db_name}'
engine = sqla.create_engine(connection_string)df_from_table = pd.read_sql_table(table_name, engine)
df_from_query = pd.read_sql_query(f'SELECT * from {table_name}', engine)
df_from_generic = pd.read_sql(f'SELECT * from {table_name}', engine)
…and that’s it! You should now have a pandas.DataFrame
object with rows and columns from the data table.
You’ll notice there are three methods for reading SQL, all three of which are demonstrated above. Both pd.read_sql_from_query
and pd.read_sql
should behave identically (at least as of the time of this writing). The three pandas.DataFrames
created should also be identical based on the query we’re using, since we’re simply selecting all columns on table_name
, the same as reading in a table without additional options specific to each method.