SQLAlchemy ORM — a more “Pythonic” way of interacting with your database

A Gordon
DataExplorations
Published in
8 min readNov 29, 2018

I’m in the process of setting up a PostgreSQL database for a python project and initially dove in with executing raw SQL through SQLAlchemy. But then I found out about SQLAlchemy ORM (object relation mapper) which provides a more pythonic and object-oriented wrapper around the database. This sounded like the better tool to use, but it took me a bit of time to figure out exactly how to get it setup properly. So I wanted to document some of my lessons learned for future reference.

For reference, this is the simple database structure I am trying to set up:

points_of_interest is the main table which has a zero-to-many relationship with the other three tables. For example, a given point of interest may have no associated Architects or it may have several Architects.

Option 1 — Raw SQL

As mentioned above, my initial approach was simply to use raw SQL statements, executed through SQLAlchemy.

For this approach, I first had to create the database engine

from sqlalchemy import create_enginedb = create_engine(f'postgresql://{DB_USER}:{DB_PASS}@{IP}:{DB_PORT}/{DB_NAME}')

Then I wrote the PostgreSQL appropriate CREATE TABLE sql statements and executed them through db.execute()

  • To establish a primary key on the poi_id field, I used the BIGSERIAL column type to get SQLAlchemy to auto-generate the id
# create main table
db.execute("""
CREATE TABLE IF NOT EXISTS points_of_interest (poi_id BIGSERIAL PRIMARY KEY,
name text,
build_year text, demolished_year text,
address text, latitude float, longitude float,
source text, external_url text, details text,
image_url text, heritage_status text, current_use text,
poi_type text)
""")
# create architectural styles TABLE
db.execute("""
CREATE TABLE IF NOT EXISTS architectural_styles (poi_id int,style text
)
""")
# create architects TABLE
db.execute("""
CREATE TABLE IF NOT EXISTS architects (poi_id int ,
architect_name text
)
""")
# create categories TABLE
db.execute("""
CREATE TABLE IF NOT EXISTS poi_categories (poi_id int,
category text
)
""")

This code does not establish a formal relationship between the tables (i.e. it is up to me as the developer to maintain the links). So, for example, as I started to load data from a pandas data frame into the database, I had to save the generated primary key from the PointsOfInterest table and use that when submitting entries to the ArchitecturalStyles table

new_id=new_row.inserted_primary_key[0]
if new_id:
db.execute('''INSERT INTO architectural_styles(poi_id, style) VALUES ( {},'{}')'''.format(new_id, row['Style']))

At this point, I learned about the ORM approach and decided to refactor my code to use it instead

Option 2 — SQLAlchemy ORM

ORM (Object Relational Mapper) creates an object-oriented wrapper around the database connection, so you can interact with your tables as python classes. Some of the advantages of this approach include

  • Keeps your code independent of the back-end database used, so theoretically you could swap out the database back end without affecting your code
  • Enforces relations between tables/objects
  • Makes code more readable for non-SQL experts

Import from SQLAlchemy

In order to use this approach, you need a different set of imports. Here are the final ones I ended up using

from sqlalchemy import create_engine, Column, Integer, String, Sequence, Float,PrimaryKeyConstraint, ForeignKey
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import sessionmaker, relationship, backref
from sqlalchemy.sql import *

Create Tables as Classes

Instead of writing SQL INSERT statements, you will define a class to represent each of your tables.

  • Class inherits from the declarative_base() class
  • define __tablename__ (the actual table name that is created in the database
  • define Columns and their types (note: these types are different from the PostgreSQL specific text, float and bigserial types that I originally defined)
Base = declarative_base()class PointsOfInterest(Base):
__tablename__ = "points_of_interest"
poi_id = Column(Integer, primary_key=True)
name = Column(String)
address = Column(String)
latitude = Column(Float)
...

How do I set up an auto-incrementing primary key like BIGSERIAL?

There is no BIGSERIAL-equivalent column type in ORM, but you can accomplish the same thing by explicitly defining a sequence:

poi_id = Column(Integer, Sequence('poi_id_seq'), primary_key=True)

Create the supporting / foreign-key tables

At first, I just tried to create the supporting tables with the same syntax as the main table:

class ArchitecturalStyles(Base):
__tablename__="architectural_styles"
poi_id =Column(Integer)
style = Column(String)

But this resulted in an error when I tried to actually create the table (more on how to do that later):

ArgumentError: Mapper Mapper|ArchitecturalStyles|architectural_styles could not assemble any primary key columns for mapped table ‘architectural_styles’”

I hadn’t defined a primary key on these tables because they’re just acting as look-up tables with supplementary information to the main table. But SQLAlchemy ORM requires you to define a primary key. So the workaround is to define a primary key encompassing both columns . This has the added benefit of preventing you from inserting duplicate architectural styles for the same building (or from inserting null entries)

from sqlalchemy import PrimaryKeyConstraintclass ArchitecturalStyles(Base):
__tablename__="architectural_styles"
__table_args__ = (
PrimaryKeyConstraint('poi_id', 'style'),
)

poi_id =Column(Integer)
style = Column(String)

One-to-Many Relationship Pattern / Foreign Key

You can set up several types of relationship patterns in SQLAlchemy ORM, but I only needed to use the One-To-Many relationship. I found this part confusing and at first only defined the foreign key on the supplementary table. But you can take more advantage of the power of ORM if you also define the relationship on the main table.

The first step is indeed to define your Foreignkey and point to the table.column of interest

class ArchitecturalStyles(Base):
__tablename__="architectural_styles"
__table_args__ = (
PrimaryKeyConstraint('poi_id', 'style'),
)
poi_id =Column(Integer,ForeignKey('points_of_interest.poi_id'))
#Defining the Foreign Key on the Child Table
style = Column(String)

The next step is to define a bidirectional relationship between the two tables (a zero-to-many from PointsOfInterest to ArchitecturalStyles and a many-to-one from ArchitecturalStyles to PointsOfInterest). This makes it easier, for example, when loading data since ORM takes care of automatically using the newly generated poi_id when populating data into the other tables.

To do this, add a relationship to the main PointsofInterest class and use the backref parameter to connect the two

class PointsOfInterest(Base):
__tablename__ = "points_of_interest"
poi_id = Column(Integer, Sequence('poi_id_seq'), primary_key=True)
...
details = Column(String)
#Defining One to Many relationship
styles = relationship('ArchitecturalStyles', backref = 'points_of_interest',lazy=True,cascade="all, delete-orphan")
  • styles = The relationship with the AchitecturalStyles will be named “styles” (this attribute will be exposed when working with the PointsOfInterest class)
  • backref = connects the two classes
  • lazy = determines how the supporting tables get loaded when you query the main table. lazy=Trueis the default option and works for me here
  • cascade = if I delete a poi_id from the main table, the linked rows in architectural styles will be deleted too (so I don’t end up with orphaned entries)

TIP: Since I’m referencing the ArchitecturalStyles class from the PointsOfInterest class and vice versa, I kept running into “not yet defined” errors when running the code. The secret was to define the main class first (PointsOfInterest) and use quotes around the referenced class names when defining the relationship( relationship='ArchitecturalStyles'). This tells SQLAlchemy to create a placeholder for a class that will be defined later

Create the database tables

Run this command to actually create the tables in the database

  • checkfirst = check if table already exists and skip the creation if it already exists
engine = connect_db()
PointsOfInterest.__table__.create(bind=engine, checkfirst=True)

Putting it all together for table creation

Here is the full code for setting up these classes:

from sqlalchemy import create_engine, Column, Integer, String, Sequence, Float,PrimaryKeyConstraint, ForeignKey
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import sessionmaker, relationship, backref
from sqlalchemy.sql import *
import os
from dotenv import load_dotenv, find_dotenv
load_dotenv(find_dotenv())
# load environment variables
SECRET_KEY = os.getenv("SECRET_KEY")
DB_USER=os.getenv("DB_USER")
DB_PASS = os.getenv("DB_PASS")
DEBUG= os.getenv("DEBUG")
DB_NAME=os.getenv("DB_NAME")
DB_SERVICE=os.getenv("DB_SERVICE")
DB_PORT=os.getenv("DB_PORT")
DB_IP=os.getenv("DB_IP")
def connect_db():
#_load_db_vars()
# create db create_engine
db = create_engine(f'postgresql://{DB_USER}:{DB_PASS}@{DB_IP}:{DB_PORT}/{DB_NAME}')
return db
Base = declarative_base()class PointsOfInterest(Base):
__tablename__ = "points_of_interest"
poi_id = Column(Integer, Sequence('poi_id_seq'), primary_key=True)
name = Column(String)
build_year = Column(String)
demolished_year = Column(String)
address = Column(String)
latitude = Column(Float)
longitude = Column(Float)
external_url = Column(String)
image_url = Column(String)
heritage_status = Column(String)
current_use = Column(String)
poi_type = Column(String)
source = Column(String)
details = Column(String)
#Defining One to Many relationships with the relationship function on the Parent Table
styles = relationship('ArchitecturalStyles', backref = 'points_of_interest',lazy=True,cascade="all, delete-orphan")
architects = relationship('Architects', backref = 'points_of_interest', lazy=True,cascade="all, delete-orphan")
categories = relationship('POICategories', backref = 'points_of_interest', lazy=True,cascade="all, delete-orphan")
class ArchitecturalStyles(Base):
__tablename__="architectural_styles"
__table_args__ = (
PrimaryKeyConstraint('poi_id', 'style'),
)
poi_id =Column(Integer,ForeignKey('points_of_interest.poi_id'))
#Defining the Foreign Key on the Child Table
style = Column(String)
class Architects(Base):
__tablename__="architects"
__table_args__ = (
PrimaryKeyConstraint('poi_id', 'architect_name'),
)
poi_id= Column(Integer,ForeignKey('points_of_interest.poi_id'))
architect_name = Column(String)
class POICategories(Base):
__tablename__="poi_categories"
__table_args__ = (
PrimaryKeyConstraint('poi_id', 'category'),
)
poi_id =Column(Integer,ForeignKey('points_of_interest.poi_id'))
category = Column(String)
engine = connect_db()
PointsOfInterest.__table__.create(bind=engine, checkfirst=True)
ArchitecturalStyles.__table__.create(bind=engine, checkfirst=True)
Architects.__table__.create(bind=engine, checkfirst=True)
POICategories.__table__.create(bind=engine, checkfirst=True)

This is saved as models.py (the complete code can be found on GitHub)

Using SQL Alchemy ORM

Inserting Rows

Now that my classes were defined, I could use them in other modules to help load data. As usual, the first step is to import the necessary classes, including our new class definitions (PointsOfInterest etc) and establish a session:

from models import connect_db, PointsOfInterest, ArchitecturalStyles, Architects,POICategoriesdb=connect_db() #establish connection
Session = sessionmaker(bind=db)
session = Session()

The following function saves data from a dataframe to the database.

  • Updates three tables: points_of_interest, architectural_styles, architects
  • In this case, each Point of Interest only has only architectural style, so I define an instance of the ArchitecturalStyles class ( style=ArchitecturalStyles(style=row['Style'])) and then append that to the Point of Interest class ( poi.styles.append(style) . Now when I commit this transaction, the entry in ArchitecturalStyles will automatically be assigned the new poi_id generated in the main table)
  • Similarly, there may be many architects, so I loop through the list and append one or more Architects instances to the Point of Interest
def save_to_database_ORM(session):

for index, row in bld_df.iterrows():
poi_dict={'name': row['Name'],address=row['Street']}

...
poi = PointsOfInterest(**poi_dict ) # define style (in ArchitecturalStyles class)
style=ArchitecturalStyles(style=row['Style'])
poi.styles.append(style)

# architects (can be multiple)
for company in row['Companies']:
architect = Architects(architect_name= company)
poi.architects.append(architect)

session.add(poi)
session.commit()

Accessing data

More details can be found in the help docs, but here are a few quick tips on accessing data through SQLAlchemy ORM:

  • Get Count of rows in table
# get the count
session.query(PointsOfInterest).count()
  • get an object by primary key
poi= session.query(PointsOfInterest).get(30)
  • filter on particular column/value
session.query(PointsOfInterest).filter(PointsOfInterest.build_year=='1905')
  • load to pandas dataframe
df = pd.read_sql(session.query(PointsOfInterest).statement, session.bind)
  • access linked table — because of the defined relationship, you can access a styles and architects list for each PointsOfInterest object
poi= session.query(PointsOfInterest).get(30)
for y in poi.architects:
print(y.architect_name)
  • delete entry (cascades to linked tables, so if this point has an associated entry in the Architects table, it will be deleted too)
poi_to_delete = session.query(PointsOfInterest).filter(PointsOfInterest.poi_id==<id>).first()
session.delete(poi_to_delete)
session.commit()

Resources for learning more

Here are a few helpful resources that I referenced when researching ORM:

The source code for the models.py class can be found on GitHub

--

--