SQLAlchemy and multiple joins

If you have a complicated database scheme and you need to fetch data using multiple nested joins then you can encounter performance issues if you use SQLAlchemy incorrectly.

The query I wanted to do looked like this

select * from Wanted_Table join Table_A_1 on <...> join Table_A_2 on <...> join Table_A_final on <...> join Table_A_1 on <...> join Table_B_final on <...> join Table_C_1 on <...> join Table_C_final join Table_A_final on <...> join Table_B_final on <...> join Table_C_final on <...> where <some_clause>

Looks insane, right? Fortunately, Postgresql can handle this monstrous query but when I tried to execute it from Python+SQLAlchemy the query execution time was much longer when I used joinedload()

Thanks to EvaSDK and agronholm from sqlalchemy IRC channel, I managed to speed it up using subqueryload() for tables that appears more than once in the query. Finally, my query became this

results_cursor = session.query(Wanted_Table) \
.options(joinedload(Table_A_1).joinedload(Table_A_2).subqueryload(Table_A_final)) \ .options(joinedload(Table_A_1).subqueryload(Table_B_final)) \
.options(joinedload(Table_C_1).subqueryload(Table_C_final)) \
.options(subqueryload(Table_A_final)) \
.options(subqueryload(Table_B_final)) \
.options(subqueryload(Table_C_final)) \

The issue is that despite Postgresql is optimized for executing single complex query faster that several small queries, SQLAlchemy have some problems due to data transferring and mapping results to Python objects. Multiple joins of the same table make SQLAlchemy looking for duplicates and it happens really slow.

This page contains good explanation about difference between joins in SQLAlchemy so if you have some problems with joins execution time try these suggestions first.