TIQETS.ENGINEERING

Better SQLAlchemy and beyond

What we learned using SQLAlchemy at Tiqets

Gaganpreet Arora
Jul 31, 2020 · 7 min read
Gardens by the Bay in Singapore

Introduction

  • Use efficient code patterns to load data into Python, and checks to detect inefficient queries
  • Improve readability of code, by using less-common SQLAlchemy patterns

A note before starting off — Tiqets uses Flask-SQLAlchemy, so some code samples are specific to that extension, but the underlying concepts still apply to SQLAlchemy.

Understand .get, .first and .scalar

All three first, scalar, and one return one row. The difference is:

  • .first applies a LIMIT 1 to the query, and returns the single row.
  • .scalar fetches all rows matching the query, and throws an exception if the query finds more than one row.
  • .one works like scalar, except that it throws an exception if the query finds zero rows.

Here’s a handy cheat-sheet:

  • Use .get for queries involving the primary key.
  • Use .first only when you don’t actually care about other rows.
  • Use .one when you expect there to be exactly one row.
  • Use .scalar where you expect there to be at most one row.

Specify an explicit loading strategy

As an example, in Tiqets we have a following simplified relationship between Product and ProductImage:

To render a list of 50 products with their images, we’ll use a query similar to this:

The list of images isn’t loaded by default, but loaded lazily the first time we access the attribute. In this case this happens by accessing p.images. This happens 50 times in the following list comprehension, necessitating a query to load the images for each product.

SQLAlchemy provides a few options for a loading strategy to use for relationships. For this particular query for 50 products with images, I ended up with these numbers:

+--------------------+-------------------+-------------------+
| Loading strategy | Number of queries | Time |
+--------------------+-------------------+-------------------+
| default (lazyload) | 51 | 71.6 ms ± 7.16 ms |
| subqueryload | 2 | 15.1 ms ± 1.32 ms |
| joinedload | 1 | 37 ms ± 1.98 ms |
| selectinload | 2 | 12.7 ms ± 1.54 ms |
+--------------------+-------------------+-------------------+

Some takeaways from these numbers:

  • For 50 products, lazyload(one query for each list) produced 55 queries.
  • joinedload(one humongous query to load everything in one go using left outer joins) uses fewer queries, but takes longer due to the extra data being fetched and deduplicated by SQLAlchemy.
  • subqueryload is almost as fast as selectinload. Both loading strategies have a similar idea in the second query they use to retrieve the relationship objects: selectinload uses the primary keys from the first query to load relationships using an IN clause, subqueryload instead reuses the first query as a subquery to filter matching results.

These results were calculated in IPython using:

My personal preference is to explicitly specify a loading strategy wherever possible. selectinload performs well for many use cases, but there are many other scenarios: see the section “What kind of Loading to use?” in the SQLAlchemy documentation.

Loading strategies for debugging

Fetch only the needed columns

Using contains_eager for manually loaded columns

This use case becomes interesting when paired with, for example, loading relationships excluding soft deletes. In that case, our query can look like this:

In this case, Product.images represents a subset of the items in the database. Keep in mind, because this might be a subset of the real collection in the database, it cannot be used to modify the collection.

Get the parameterized SQL query for debugging

Notice that this is not really the raw SQL query, it’s parametrized and can’t be run directly against the database. But that’s possible by integrating with Postgres directly, which is up next.

Get the real SQL query

Once any existing database connections need to be reinitialized by restarting the app, all queries will be logged in Postgres logs (within a Docker container, Postgres logs are sent to stdout).

Set up and enable Flask-DebugToolbar

Cache inefficient queries

  • Cache invalidation is very difficult across a distributed system, especially in microservices. But let’s say we ignore that, and are willing to use cached results with a small delay until the cache expires
  • When caching a function that accepts parameters, we’ll be caching each combination of function name plus parameters separately. These cached results (if not invalidated), will be out of sync until the cache expires for every combination. This creates an unpredictable situation where the same function will return sometimes up-to-date information, and other times stale information, depending on the values of its parameters and how it has been executed before.

Useful ORM features

Using the SQLAlchemy session as a “cache”

Hybrid attributes

In this example User.first_name (class level, notice the upper case U) can be used in SQLAlchemy expressions directly, and user.first_name (instance level, notice the lower case u) can be used on instance level. The SQLAlchemy documentation goes into a lot of detail explaining how hybrid properties can be used.

Ordering list

Conclusion

Author

Tiqets Engineering

Technology and more ways to culture

Tiqets Engineering

Tiqets connects travelers with instant, last-minute, mobile tickets to top museums and attractions all over the world. Read about all the Tech we do at Tiqets and more!

Gaganpreet Arora

Written by

Tiqets Engineering

Tiqets connects travelers with instant, last-minute, mobile tickets to top museums and attractions all over the world. Read about all the Tech we do at Tiqets and more!

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store