Introducing Pipelining and Asynch Programming for Oracle Database 23ai

8 min readNov 15, 2023

The requirement to develop scalable and responsive applications deployed in the cloud using programming languages (Python, .NET, Java, R) has increased the demand to adapt to programming paradigms that help applications improve their resilience, performance, responsiveness, throughput, etc. One such paradigm is pipelining, a technique strategically used to increase the overall throughput and responsiveness of the application.

Oracle Database 23ai provides pipeline support for the OCI, JDBC and .NET drivers.

OCI pipelining is a 23ai client-side feature that enables an OCI client to issue multiple requests (SQL executions/fetches, commits, rollbacks) to database servers. Without this feature, an OCI client can issue another request only after the database server has responded to the previous request. At the same time, the database server needs to stay idle, waiting for the next request from the client.

For a better understanding of pipelining and the motivation behind introducing it in Oracle Database 23ai, let’s explain some concepts related to request/response flows in client-server network transactions.

Blocking functionality

All OCI APIs which make a network round trip are request and response in nature, as illustrated in the diagram below:

The client receives (reads) the response

In blocking mode, the request-response needs to complete before the server accepts another request, resulting in idle time on the client side between the requests and on the server side between the responses.

Non-Blocking functionality

The server does not block the client from waiting until a response is read, allowing the application to choose what to do while the server is processing the request.

Though the application responsiveness increases by a factor, the server idle time does not improve much compared to the blocking approach.

Pipelining

Pipelining allows sending/receiving multiple requests/responses in a single transaction. The server is busy executing the requests and the application can use the interleaving appropriately and wait less for the server to send a response

• The application can asynchronously perform other operations (including sending new requests to the server) without reading the response to the previous request(s)

• The server processes the requests in order and sends responses back to the client, when ready

Let’s have a look at a request-response flow in the diagram below.

Initially, 6 requests are being sent to the server, so a pipeline of operations is queued at the server side.
The server is busy executing these operations without waiting for a client to send the next operation: the first operation was processed, and a response has been sent back to the client, ready to be read; the second operation finished execution, and a response is being sent to the application, while the server is still orderly processing the remaining requests in the queue, namely operations 3 to 6; the client is asynchronously sending request 7.

So when is Pipelining useful?

Pipelining is useful when performing many small operations in rapid succession, i.e.: DDLs, DMLs, DMLs with returning clause, exact fetches, multiple fetches, fetches from two different statement handles, Queries, PL/SQL, etc.

Pipelining is more performant when the network latency is high (i.e. the server and the client are on different hosts or in different subnets)

Pipelining is less performant when an operation has a dependency on the result of a previous operation.

Note that the application is responsible for ensuring that dependency on two consecutive operations is resolved or avoided.

OCI Pipeline API

By default, OCI APIs are not available to an application for pipelining. A new attribute, OCI_PIPELINE_ENABLE, is introduced to enable pipelining. The pipeline functionality is available only when the application sets the attribute on the environment. Any call to OCI pipeline function without this mode set returns an error.

For OCI pipelining, we introduced an explicit block to demarcate the pipelined set of operations. As a developer, you must know that the pipelined operations must be independent.

Two new APIs are introduced to create this explicit pipeline block.

OCIPipelineBegin() — marks the start of the pipeline block

and

OCIPipelineEnd() — marks the end of the pipeline block

An optional API, OCIPipelineProcess(), is used to process server responses for pipelined operations queued on the client side.

Blocks of pipeline operations serialization (consecutive pipeline blocks) is supported.

OCI Pipeline Modes

There are two levels of pipeline mode, and each level can set one of the two modes of execution:

1. Block-level mode — set by OCIPipelineBegin()

2. Operation level — set using OCIAttrSet

The two pipeline modes are:

1. OCI_PIPELINE_CONT_ON_ERROR — this is the default mode of a pipeline. If any of the pipeline operations fails with an error, the server sends a response with the error back to the client and continues executing the next operations in the pipeline.

2. OCI_PIPELINE_ABORT_ON_ERROR — all the operations from which the error occurred to OCIPipelineEnd() are dropped (except the piggybacks on RPC are executed); ORA-43610 error is returned

Here’s an OCI Pipelining code example that you might find useful if you wanted to start using the pipelining functionality in your application:

Note that the OCI application enabling pipelining begins with OCIPipelineBegin () and ends with OCIPipelineEnd(). All the independent OCI pipeline-aware APIs in this block will be implicitly pipelined.

Here, an application submits two OCIStmtExecute statements — these pipeline operation 1 and operation 2 are sent one after each other and are queued at the server side because, in the pipeline mode, the application does not need to wait for the server to send a response back to operation 1 before sending a request for operation 2.

While the server is processing the queued operations, the application can execute its own logic before pipelining the next operation. This includes sending another request to the server, which in our example is sending pipeline operation 3, which is another execute statement, followed by a Fetch — all the data retrieved is then inserted into a table.
OCIStmetFetch(stmt 1) is pipeline operation 4, and let’s say that the fetched data based on stmt 1 needs to be inserted into a table in a later operation (let’s say this is the OCIStmtExecute pipeline operation 5 down below in the code block).
operation 1 is already processed by Fetch before sending Operation 4
to fetch all the data, we need to call the OCIPipelineProcess() API, which would read all the responses from the server. We need to explicitly specify which pipeline operation we need to wait for (operation 4 — the fetch statement)
we do this by passing the lastOpID (index) as an argument to the OCIPipelineProcess call, which stands for: process all the operations from the server up to operation 4, aka process operations 2, 3 and 4.
Pipeline operation 5: OCIStmetExecute is inserting all the data fetched by operation 4 into a table.
Following the data insert, we could proceed with reading the data OCILobRead() — pipeline operation 6.
To mark the end of the pipeline block, a call to OCIPipelineEnd is made.
All other operations following this call will execute in blocking mode until another OCIPipelineBegin() call is encountered to mark the beginning of another pipeline section.

Testing the Performance of No Pipelining vs. Pipelining

OCI pipeline feature improves performance by transforming the time database servers spend waiting on the “SQL*Net message from clients” event to database CPU time.

It achieves this by populating the OCI pipeline with enough requests to keep the database servers utilized more.

Internally, we used a series of workloads designed and used for testing the performance of no pipelining vs. pipelining operations.

The OCI clients for the workloads are multi-threaded applications and use a session pool.

Read-only (or SELECT) workload

The first workload is a simple OLTP read-only workload implemented using OCI. Each OCI client repeatedly executes a Select SQL statement in a loop, without think time, for a specified time.

We have used a different number of sessions running the read-only workload.

The network between the client and server is 100Gbs.

We are observing significant performance gains, especially when the number of sessions is low. We could use fewer sessions to drive the same throughput using the OCI pipeline. Depending on the overall CPU load, the transaction profiles on the server side, and network latency between client and server, the feature resulted in:

• between 20% and 97% performance improvement for simple read-only workload.

• in terms of CPU usage, we’ve noticed that for this scenario, the OCI client consumes about 3 times as much CPU when OCI pipeline is used. The client spends 12–14 microseconds CPU for each SQL execution without pipeline, and 36 microseconds CPU for each execution with pipeline enabled.

DML Workload

This workload consists of purely DMLs. 20% of the workload is INSERT, and the rest is UPDATE. The workload executes, in a single thread, a large number of DMLs before it performs a commit, as below:

loop for N times
  insert;
  update;
  update; 
  update;
  update;
end loop;
commit

We adjust the number of iterations based on the pipeline depth.

The OCI clients for both workloads are multi-threaded applications and use the session pool. We run the DML workload using a single thread.

The data shows that the OCI pipeline feature is effective for networks with long latency.

Depending on the overall CPU load, the transaction profiles on the server side, and network latency between client and server, the feature provides

• between 4x and 500x improvement depending on the DML workload.

• the OCI client CPU consumption stays relatively the same (within 10%).

OLTP Workload

The third workload used for performance testing is the OLTP workload.

This workload is a modified TPC-C workload, where the transaction is modified to be 70 reads and 30 writes, to reflect more of the real work workload. There are five different transactions in this workload. Three of the transactions are implemented using anonymous PL/SQL blocks. One transaction is implemented using PL/SQL procedure. One transaction is implemented by a single SQL statement invoked from OCI client.

We have used a different number of sessions (up to 80) running the OLTP workload. The network between client and server is 10Gbs. The idle ping latency is about 0.2 milliseconds.