SQLAlchemy in Python (with PostgreSQL)

11 min readAug 1, 2021

最近因為需要用Python操作PostgreSQL資料庫，研究了一些關聯式資料庫的工具。在此簡單整理Python中操作關聯式資料庫的方法，以SQLAlchemy的介紹與程式範例為主。

Outline

ORM
SQLAlchemy (github 4k star)
SQLAlchemy ORM
SQLAlchemy Core
SQLAlchemy Engine Configuration
SQLAlchemy Core 2.0 style APIs
asyncpg (github 5k star)
Insert Speed Test

ORM

物件關係對應，是一種跟資料庫溝通的模式，全名為object-relational mapping。

ORM可以將程式中的objects對應轉換成資料庫中的tables關聯。
一般來說，會建立一個class來代表一個SQL資料庫的table，該class中的每個attribute就代表一個column，而每個instance就代表資料庫的一個row。ORM工具也可以將不同表格的關聯帶入程式中。

舉例來說，可以在程式中使用以下對應關係：

classPet：SQL表格pets
objectorion_cat (Pet的實例)：pets中的資料
orion_cat.type：資料的type欄位，其中的值可能會是"cat"，代表這是一隻貓的資料
orion_cat.owner：這隻貓的owner(來自另一個SQL表格owners)的資料
orion_cat.owner.name：這隻貓的owner名稱(來自SQL表格owners的name欄位)

幾個常見的ORM：Django-ORM、SQLAlchemy ORM、Peewee

SQLAlchemy

在Python中可以使用SQLAlchemy來操作關聯式資料庫，其支援的常見資料庫如下：

PostgreSQL
MySQL
SQLite
Oracle
Microsoft SQL Server, etc.

SQLAlchemy本身提供兩種不同的API，分別為Core和ORM。

Core：SQLAlchemy做為資料庫工具包的基礎，著重在與資料庫的連線與溝通，以及方便建立各種SQL陳述句(不需要寫原生SQL語法)。
ORM：以Core為基底，在其上提供物件關係對應的功能，能將Python類別跟物件直接映射到資料庫中的資料表與數據行。大多情況下，ORM在速度與效能上會比Core差，請評估需求後再決定是否要使用。

截至2021/07，最新release版本為SQLAlchemy 1.4.22，存在兩種使用風格，1.x style和2.0 style，後者相比前者有較大幅度的改動，且讓Core和ORM的用法更加貼近。
如果是在舊有程式版本上做修改，可以繼續採用1.x style；如果是要開發新的程式，可以考慮直接使用2.0 style，未來如果要升級才不會有相容性的問題。
以下範例程式碼主要參照1.x style，官方預計在主版本升到SQLAlchemy 2.0時才會捨棄1.x style的大部分用法。詳情請參考官方文件。

SQLAlchemy ORM

範例程式碼

sql_app/database.py

此處的SQLALCHEMY_DATABASE_URL以PostgreSQL格式為例，如果使用不同種類的資料庫，需要改為相對應的連線字串。

透過create_engine來建立資料庫的連線，sessionmaker則是建立連線會話。

Base是SQLAlchemy model的基礎類別，透過繼承它來建立表格的model。

sql_app/models.py

使用Column來定義SQL表中的欄位；使用ForeignKey來定義與其他SQL表的關聯。

owner_id = Column(Integer, ForeignKey("users.id"))確定了表格之間的關聯，再透過relationship指定放置關聯表資料的欄位。

user.items會是一個list，裡面包含所有owner_id指向它的Item(來自表格items的SQLAlchemy model)；而item.owner會是它指向的User(來自表格users的SQLAlchemy model)。

sql_app/crud.py

透過sqlalchemy.orm的Session，對不同的SQL表(就是前面定義的SQLAlchemy model)進行query，也可以進一步使用filter來對查詢對象做條件限制。

如果只要查詢單一數據行，使用first()；如果要回傳所有符合條件的數據，使用all()。

在create相關的函式中，使用的schemas為FastAPI框架下所定義的Pydantic models(schemas)，跟SQLAlchemy本身並不相關，也可以單純傳入一個dict或其他類別參數。

在create相關的函式中，會進行幾個步驟：

建立一個SQLAlchemy model的實例，代表一個數據行
add此實例物件到資料庫session
commit這個數據行到資料庫(此時才被存入資料庫中)
refresh此實例，讓它包含資料庫新產生的資料(ex.自動生成的ID)

這些定義好的資料庫操作函式，根據不同框架或應用，在需要的地方呼叫(ex. FastAPI的路徑操作函式)

SQLAlchemy Core

範例程式碼

資料庫連線

echo用來設定SQLAlchemy的logging

定義與創建資料表

使用Metadata和Table來建立資料表，再用create_all()指向資料庫創建。

Insert

使用insert()來產生對表格的插入操作語句，再透過conn.execute()執行語句。

1和2為單筆數據插入的用法；3則是一次插入多筆數據(就像一般DBAPI的executemany)，此用法也適用在update()和delete()操作。

Select

使用select來對目標資料表做查詢，查詢結果可以透過iteratio或是使用fetchone()取得。

如果result還沒取完就要捨棄，可以使用result.close()將其關閉。

Operators

在資料庫操作語句中，可以使用Operator來指定查詢的條件

SQLAlchemy Engine Configuration

Engine是SQLAlchemy跟資料庫連線的起始點，讓資料庫與DBAPI能進行溝通。

在建立連線字串時，最常見的格式為dialect+driver://username:password@host:port/database。如果沒有指定driver(DBAPI)，會使用預設值。

以PostgrSQL為例，預設的DBAPI為psycopg2，範例如下

目前支援的DBAPI：

psycopg2
pg8000
asyncpg
psycopg2cffi
py-postgresql
pygresql

SQLAlchemy Core 2.0 style APIs

這裡補充2.0 style API的使用方法，和額外一些特殊用法。

Connection

使用2.0style的engine，要在create_engine()帶入參數future=True。

Table Reflection

使用autoload_with連結資料庫中已存在的數據表，會自動帶入欄位等相關屬性。

Insert

創建insert的實例來產生插入陳述句，可使用values帶入參數。接著透過連線的execute()去執行插入語句，commit()寫入資料庫。

可以在execute階段帶入多個參數，進行批量資料的插入。

Psycopg2 Fast Execution Helpers

如果是使用psycopg2的driver，在進行大量數據的批量插入時，可能會發現速度很慢。因為預設並不會使用加速的extensions。

psycopg2有提供針對批量執行時的加速方法(目前有execute_values和execute_batch兩種)。如果要在SQLAlchemy使用這類加速，要在create_engine()時加上executemany_mode參數。

executemany_mode目前有下列幾種模式

values_only：預設值，一律使用execute_values做批量執行
None：不使用加速extensions，而使用一般的cursor.executemany()
batch：使用execute_batch來做插入、更新、刪除操作
values_plus_batch：插入使用execute_values，更新與刪除使用execute_batch

Upsert in PostgreSQL

PostgreSQL在9.5版本之後，支援等同於upsert功能的ON CONFLICT語法，就是在進行插入操作時，可以指定遇到衝突(主鍵或指定欄位重複)時改為進行更新操作(或什麼都不做)。

在SQLAlchemy之中，兩種情況分別使用on_conflict_do_update()、on_conflict_do_nothing()，注意要使用dialects.postgresql的insert來生成statement。

asyncpg

使用Python的asyncio，高效的PostgreSQL client library。根據測試，比常用的psycopg2快上三倍。

Basic Usage

Insert Speed Test

由於我需要對PostgreSQL做數據的頻繁插入操作，因此決定測試一下使用不同方式做插入的效能。SQLAlchemy目前也有支援使用asyncpg，所以可以順便比較一下不同drivers的效能差異。

總共測試四種插入方式

psycopg2
asyncpg
SQLAlchemy(psycopg2)
SQLAlchemy(asyncpg)

因為我的目的是要比較彼此間相對的插入效能高低，我只插入很單純的三欄位數據行(但有primary_key，所以是進行upsert)，而且是直接在PostgreSQL伺服器上跑測試程式，基本上沒有網路時延，所以秒數只能當作參考。我分別測試連續插入1000跟10000行，都跑十次取平均。

從插入操作的測試結果來看，asyncpg的速度是psycopg2的兩倍以上，如果使用SQLAlchemy，速度上會變慢一些，但使用SQLAlchemy(asyncpg)的速度還是能達到原生psycopg2的將近兩倍。

透過以上結果，可以做出相對的插入效能比較。

PostgreSQL插入效能：asyncpg > SQLAlchemy(asyncpg) >> psycopg2 > SQLAlchemy(psycopg2)

Reference

SQL (Relational) Databases - FastAPI

FastAPI doesn't require you to use a SQL (relational) database. But you can use any relational database that you want…

fastapi.tiangolo.com

SQLAlchemy 1.4 Documentation

About this document This tutorial covers the well known SQLAlchemy Core API that has been in use for many years. As of…

docs.sqlalchemy.org

SQLAlchemy 1.4 Documentation

About this document The new SQLAlchemy Tutorial is now integrated between Core and ORM and serves as a unified…

docs.sqlalchemy.org

GitHub - MagicStack/asyncpg: A fast PostgreSQL Database Client Library for Python/asyncio.

asyncpg is a database interface library designed specifically for PostgreSQL and Python/asyncio. asyncpg is an…

github.com

SQLAlchemy in Python (with PostgreSQL)

Outline

ORM

SQLAlchemy

SQLAlchemy ORM

SQLAlchemy Core

SQLAlchemy Engine Configuration

SQLAlchemy Core 2.0 style APIs

Connection

Table Reflection

Insert

Psycopg2 Fast Execution Helpers

Upsert in PostgreSQL

asyncpg

Basic Usage

Insert Speed Test

PostgreSQL插入效能：asyncpg > SQLAlchemy(asyncpg) >> psycopg2 > SQLAlchemy(psycopg2)

Reference

SQL (Relational) Databases - FastAPI

FastAPI doesn't require you to use a SQL (relational) database. But you can use any relational database that you want…

SQLAlchemy 1.4 Documentation

About this document This tutorial covers the well known SQLAlchemy Core API that has been in use for many years. As of…

SQLAlchemy 1.4 Documentation

About this document The new SQLAlchemy Tutorial is now integrated between Core and ORM and serves as a unified…

GitHub - MagicStack/asyncpg: A fast PostgreSQL Database Client Library for Python/asyncio.

asyncpg is a database interface library designed specifically for PostgreSQL and Python/asyncio. asyncpg is an…

Written by Kevin Wei