Concurrent Python Programming with python-oracledb and asyncio

Published in

Oracle Developers

6 min readNov 30, 2023

The long wait for Python asyncio support in python-oracledb is over! A pre-release is available in python-oracledb 2.0. (Update: python-oracledb 2.1 removed the ‘pre-release’ status for support).

A picture of a hand basin i.e. a sink — A sink — a bad pun. Photo by Alona Gross on Unsplash

We are soliciting your feedback on this latest enhancement — please join the discussion on GitHub.

What’s the background?

python-oracledb is the Python module for Oracle Database access. This open source package conforms to the Python Database API specification with many additions to support advanced Oracle Database features.

Python’s asyncio library makes it easy to write concurrent code. The documentation description is: “asyncio is used as a foundation for multiple Python asynchronous frameworks that provide high-performance network and web-servers, database connection libraries, distributed task queues, etc.”

What is new in python-oracledb

The new python-oracledb async API is part of the existing python-oracledb module. Any database call which needs a round-trip to the database now has an async counterpart. This means your code can choose whether to use the existing synchronous API or alternatively use the new async API. We suggest applications in general don’t use both at the same time!

Only python-oracledb’s default Thin mode supports asyncio. There are no plans to implement it in Thick mode.

Connection is via new methods connect_async() and create_pool_async(). All database versions that Thin mode connects to can be used, i.e. Oracle Database 12.1+. There are also new classes AsyncConnection, AsyncConnectionPool, AsyncCursor, and AsyncLOB that are similar to the current synchronous classes. More interestingly there are some new shortcut methods on AsyncConnection objects that simplify use:

connection.callfunc() - Call a database stored function. This is a shortcut for creating a cursor, calling the stored function with the cursor, and then closing the cursor.
connection.callproc() - Call a database stored procedure. This is a shortcut for creating a cursor, calling the stored procedure with the cursor, and then closing the cursor.
connection.execute() - Execute a statement. This is a shortcut for creating a cursor, executing a statement with the cursor, and then closing the cursor.
connection.executemany() - Prepare a statement for execution and then execute it against all parameter mappings or sequences found in the sequence parameters. This is a shortcut for creating a cursor, calling executemany() on the cursor, and then closing the cursor.
connection.fetchall() - Executes a query and returns all of the rows. After the rows are fetched, the cursor is closed.
connection.fetchmany() - Executes a query and returns up to the specified number of rows. After the rows are fetched, the cursor is closed.
connection.fetchone()- Executes a query and returns the first row of the result set if one exists (or None if no rows exist). After the row is fetched the cursor is closed.

Collateral

Samples are in samples/*_async.py.

Tests are in tests/*_async.py.

Documentation is here.

Installing python-oracledb

Install as normal. Note you need Python 3.7 or later.

python -m pip install oracledb --upgrade

Example 1

A basic example is:

import asyncio
import oracledb

async def main():
    connection = await oracledb.connect_async(user="hr", password="welcome", 
                                              dsn="localhost/orclpdb1")
    res = await connection.fetchall("select * from locations")
    print(res)
    await connection.close()

asyncio.run(main())

Make sure you use async and await appropriately. The Python manual has some useful tips for asyncio usage.

Example 2

A second example shows execution of multiple coroutines. But first, look at code that doesn’t use asyncio:

# synchronoustest.py

import os
import time

import oracledb

un = os.environ.get('PYTHON_USERNAME')
pw = os.environ.get('PYTHON_PASSWORD')
cs = os.environ.get('PYTHON_CONNECTSTRING')

CONCURRENCY = 5

sql = """SELECT UNIQUE CURRENT_TIMESTAMP AS CT, sid||'-'||serial# AS SIDSER
         FROM v$session_connect_info
         WHERE sid = SYS_CONTEXT('USERENV', 'SID')"""

def init_session(connection, requested_tag):
    res = connection.cursor().execute(sql).fetchone()
    print(res[0].strftime("%H:%M:%S.%f"), '- init_session with SID-SERIAL#', res[1])

def connect_and_query_sync(pool):
    with pool.acquire() as connection:
        with connection.cursor() as cursor:
            cursor.callproc("dbms_session.sleep", [1])
            res = cursor.execute(sql).fetchone()
            print(res[0].strftime("%H:%M:%S.%f"), '- connect_and_query_sync with SID-SERIAL#', res[1])

def main():
    pool = oracledb.create_pool(user=un, password=pw, dsn=cs,
                                min=CONCURRENCY, max=CONCURRENCY,
                                session_callback=init_session)

    start = time.time()
    for i in range(CONCURRENCY):
        connect_and_query_sync(pool)
    elapsed = time.time() - start
    print("{:04.2f} seconds".format(elapsed))

main()

Running this on my laptop gave:

$ python3 synchronoustest.py 
12:09:35.490387 - init_session with SID-SERIAL# 14-36917
12:09:36.514168 - connect_and_query_sync with SID-SERIAL# 14-36917
12:09:37.540810 - connect_and_query_sync with SID-SERIAL# 14-36917
12:09:38.569611 - connect_and_query_sync with SID-SERIAL# 14-36917
12:09:39.592086 - connect_and_query_sync with SID-SERIAL# 14-36917
12:09:40.609052 - connect_and_query_sync with SID-SERIAL# 14-36917
5.30 seconds

This isn’t concurrent at all! The for loop in main() is a giveaway that execution is serial. Confirming this is the logging showing the session initialization function being called just once - meaning only one connection was used by the app. That one connection (identified by the unique Session Identifier and Serial Number) was used and released in each of the serial invocations of connect_and_query_sync(). Also remember that each Oracle connection can only ever do one task at a time, so to improve throughput some kind of parallelism that utilizes multiple connections is needed.

Compare the implementation with this version that uses asyncio for parallelization:

# asynctest.py

import asyncio
import os
import time

import oracledb

un = os.environ.get('PYTHON_USERNAME')
pw = os.environ.get('PYTHON_PASSWORD')
cs = os.environ.get('PYTHON_CONNECTSTRING')

CONCURRENCY = 5

sql = """SELECT UNIQUE CURRENT_TIMESTAMP AS CT, sid||'-'||serial# AS SIDSER
         FROM v$session_connect_info
         WHERE sid = SYS_CONTEXT('USERENV', 'SID')"""

async def init_session(connection, requested_tag):
    res = await connection.fetchone(sql)
    print(res[0].strftime("%H:%M:%S.%f"), '- init_session with SID-SERIAL#', res[1])

async def connect_and_query(pool):
    async with pool.acquire() as connection:
        await connection.callproc("dbms_session.sleep", [1])
        res = await connection.fetchone(sql)
        print(res[0].strftime("%H:%M:%S.%f"), '- connect_and_query with SID-SERIAL#', res[1])

async def main():
    pool = oracledb.create_pool_async(user=un, password=pw, dsn=cs,
                                      min=CONCURRENCY, max=CONCURRENCY,
                                      session_callback=init_session)

    coroutines = [ connect_and_query(pool) for i in range(CONCURRENCY) ]

    start = time.time()
    await asyncio.gather(*coroutines)
    elapsed = time.time() - start
    print("{:04.2f} seconds".format(elapsed))
    await pool.close()

asyncio.run(main())

Running this shows that multiple connections are opened and are used by connect_and_query() :

$ python3 asynctest.py 
12:09:29.711525 - init_session with SID-SERIAL# 36-38096
12:09:29.909769 - init_session with SID-SERIAL# 33-56225
12:09:30.085537 - init_session with SID-SERIAL# 14-31431
12:09:30.257232 - init_session with SID-SERIAL# 285-40270
12:09:30.434538 - init_session with SID-SERIAL# 282-32608
12:09:30.730166 - connect_and_query with SID-SERIAL# 36-38096
12:09:30.933957 - connect_and_query with SID-SERIAL# 33-56225
12:09:31.115008 - connect_and_query with SID-SERIAL# 14-31431
12:09:31.283593 - connect_and_query with SID-SERIAL# 285-40270
12:09:31.457474 - connect_and_query with SID-SERIAL# 282-32608
1.92 seconds

Your results may vary. When I tried with a far-distant database, I saw only a couple of connections being opened. (And, of course, the increase in round-trip cost affected the overall times of both async and synchronous code).

Obviously having multiple connections open means more database load, so when moving to a parallel implementation you need to factor in your application’s impact on the whole system.

An equivalent implementation using raw threading instead of asyncio could be:

thread = []
for i in range(CONCURRENCY):
    t = threading.Thread(target=connect_and_query_sync, args=(pool,))
    t.start()
    thread.append(t)

for i in range(CONCURRENCY):
    thread[i].join()

This will have similar performance profile to the asyncio code, but is arguably more complex even for this simple example.

On a final note, both the asyncio and threaded code will be affected by Python’s Global Interpreter Lock (GIL) so don’t assume you can speed everything up by using asyncio. You may find that multiple processes perform better for some intensive tasks. And the database configuration will have a big impact too.

Open tasks

Some tasks on our list:

[Update: this was done in python-oracledb 2.2] Make use of Oracle Database 23c’s updated network protocol to improve performance with bigger database responses (when connected to Oracle Database 23c!)
Evaluate using Oracle Database 23c’s new pipelining functionality.
[Update: SQLAlchemy added this in SQLAlchemy 2.0.25] Encourage support by SQLAlchemy.

Conclusion

Python-oracledb support for asyncio opens up new architectures, allowing easier use of parallel programming and more efficient applications. Newer Python versions have improved asyncio features, such as task groups that can further help simplify code and give you more control.

Careful benchmarking should be done to evaluate the benefits and overall system impact of parallelizing database work. If your DBA complains about the number of connections being opened, ask them to explore using DRCP (also see the DRCP Technical Brief). Note that your schema configuration may inherently limit parallelism.

Please give us feedback on the design and implementation. The best place is on GitHub but comments here or via email are welcome.

[This post was originally written before python-oracledb 2.0 was released. Now that this version is available, the content has been updated with the latest information.]