Revisiting the Performance & Scalability of Java Applications that use RDBMSes
A re-edition of my recent blog post
There is an abundant literature on Java performance (books, articles, blogs, websites, and so on); a Google search returns more than 5 millions hits. To name a few, the Effective Java programming language guide, Java Performance the definitive guide, Java performance tuning newsletter and its associated website.
This blog post revisits the known best practices for speeding up and scaling database operations for Java applications then discusses new mechanisms such as database proxies, and the Asynchronous Database Access (ADBA) proposal.
Speeding up RDBMS operations in Java apps
Optimizing database operations for Java applications includes: speeding up database connectivity, speeding up SQL statements processing, optimizing network traffic, and in-place processing.
Speeding up Database Connectivity
Connection establishment is the most expensive database operation; the obvious optimization that Java developers have been using for ages is connection pooling which avoid creating connections at runtime (unless you exhaust the pool capacity.
Client-side Connection Pools
Java connection pools such as the Apache Commons DBCP, C3P0, the Oracle Universal Connection Pool (UCP) and so on, are libraries to be used as part of your stand-alone Java applications or as part of the datasource of Java EE containers e.g.,Tomcat, Weblogic, WebSphere and others. Java EE containers embed their own connection pools but also allow 3rd party pools (e.g., using Using UCP with Tomcat, Using UCP with Weblogic).
Most Java applications use client-side or mid-tier connection pools to support small and medium workloads however, these pools are confined to the JRE/JDK instance (i.e., can’t be shared beyond their boundaries) and unpractical when deploying tens of thousands of mid-tiers or Web servers. Even with a very small pool size on each web tier, the RDBMS server is overwhelmed by thens of thousands of pre-allocated connections that are idle more than 90% of the time.
Proxy Connection Pools
Database proxies such as MySQL Router, the Oracle Database Connection Manager in Traffic Director Mode (CMAN-TDM), NGINX and others, are proxy servers that sit between the database clients (i.e., Java apps, Web tiers) and the RDBMS. These allow thousands of mid-tiers to share a common connection pool. See database proxy in the second part of this blog.
Misc. Connection Optimizations
Other connection optimization features include: deferring connection health check, and de-prioritization of failed nodes.
Deferring Connection Health Check
The ability of a connection pool to defer the health checking of connections for a defined period of time, fastens connection check-out (i.e., getConnection() returns faster).
De-prioritization of Failed Nodes
In a multi-instances clustered database environment such as Oracle RAC, this feature assigns a low priority to a failed instance for a defined period of time (iow, avoids attempts to get connections from the failed instance) thereby speeding up connection check-out.
Speeding up SQL Statements Processing
Processing a SQL statement requires several steps including: parsing (at least once), binding variables, executing, fetching resultSets (if a query), and COMMITting or ROLLBACKing the transaction (if a DML i.e., Insert, Update, or Delete).
JDBC furnishes several APIs/knobs for optimizing SQL statements processing including: Prepared Statements, Statements Caching, and ResultSets caching.
Disabling Default COMMIT
Auto-COMMITTING each DML is the default/implicit transaction mode in JDBC . Unless this mode corresponds to your desire, you should explicitly disable it on the connection object and demarcate your transactions (DML + Queries) with explicit COMMIT or ROLLBACK calls.
Ahead of its execution, a SQL statement must be parsed, if not already. Parsing (i.e., hard parsing) is the most expensive operation when processing a SQL statement. The best practice consists in using Prepared Statements which are parsed once then reused many times on subsequent invocations, after setting new values for the bind variables. A security byproduct of Prepared Statements is the prevention of SQL injection.
The JDBC driver may be directed to automatically cache SQL statements (PreparedStatements and CallableStatements) on smt.close(). On subsequent invocation of the same statements, the driver directs the RDBMS to use an existing statement (i.e., “use statement #2) without sending the statement string, thereby avoiding soft parsing (lexical analysis, syntactic parsing) and potentially a network roundtrip.
Implicit statement caching is enabled either on the connection object or the datasource object (note: the statement cache is an array per physical connection).
ResultSets Caching with Change Notification — the Hard Way
Caching JDBC result sets avoids re-executing the corresponding SQL query, resulting in dramatic Java applications performance. RDBMSes allow caching ResultSet at the server side however, applications needs roundtrips to the database to retrieve it. This topic is discussed at length in chapter 15th of the Oracle database performance tuning guide.
Optimizing further, these result set can be pushed to the drivers (Java, C/C++, PHP, C#, and so on) and consumed by the applications without database roundtrips. What if the ResultSets become stale, out of sync with the actual data in the RDBMS table? RDBMSes furnish mechanisms for maintaining the ResultSets up to date thereby ensuring the consistency of the cached ResultSets. For example, the Oracle database’s Query Change Notifications allows registering a SQL query with the RDBMS and receiving notifications when committed DMLs from other threads render the ResultSets out of sync.
Java applications may explicitly implement ResultSet caching with change notification through the following steps:
0) Prerequisites: the server-side ResultSet caching must be enabled and database user schema must be granted the “CHANGE NOTIFICATION” privilege.
e.g., grant change notification to HR; // might need your DBA’s help.
- Create “a registration” on the connection object
2) Associate a query with the registration
3) Listen to the notification
See more details in chapter 26 of the Oracle JDBC Developers guide.
ResultSets Caching with Change Notification — the Easier Way
You may, preferably, enable client-side ResulSet caching with invalidation in a much easier way, using the following steps (available with the Oracle JDBC driver release 18.3 and up)
- Set the following parameters in the database configuration file a.k.a. INIT.ORA.
2) Set the JDBC connection property oracle.jdbc.enableQueryResultCache to true (the default).
3) add the following hint to the SQL query string “/*+ RESULT_CACHE */”
If changing the Java/JDBC source code to add the SQL hint is not an option, you can instruct the RDBMS to cache the ResultSets of all queries related to a specific table, either at table creation (the default mode) or later (force mode); this is called “Table Annotation”.
The Oracle RDBMS furnishes views such as the V$RESULT_CACHE_STATISTICS and a CLIENT_RESULT_CACHE_STATS$ table for monitoring the effectiveness of ResultSet caching. See section 15 in the performance tuning guide for more details on configuring the server-side result set cache
Array fetching is a must when retrieving a large number of rows from a ResultSet. The fetch size can be specified on the Statement, or the PreparedStatement, or the CallableStatement, or the ResultSet objects.
When using the Oracle database, this array size is capped by the RDBMS’s internal buffer known as Session Data Unit (SDU). The SDU buffer is used for transferring data from the tables to the client, over the network. The size of this buffer, in bytes, can be specified in JDBC URL as illustrated hereafter.
or at the service level in Net Services configuration files sqlnet.ora and tnsnames.ora. There is a hard limit depending on the RDBMS release: 2MB with DB 12c and up, 64K with DB 11.2, and 32K with DB pre-11.2.
In summary, even if you set the array fetch to a large number, it cannot retrieve more data than the SDU permits, for each roundtrip.
Array DML (Update Batch)
The JDBC specification allows sending a batch of the same DML operations (i.e., array INSERTs, array UPDATEs, array DELETE) for sequential execution at the server, thereby reducing network round-trips.
Update Batching consists in explicitly invoking the addBatch method which adds a statement to an array of operations then explicitly calling executeBatch method to send it as in the following example.
Optimizing Network Traffic
Here are two mechanisms that will help optimize the network traffic between your Java code and the RDBMS: network compression and session multiplexing.
Network Data Compression
The ability to compress data transmitted between the Java applications and the RDBMS over LAN or WAN reduces the volume of data, the transfer time and the number of roundtrips.
The Oracle database Connection Manager a.k.a. CMAN, furnishes the ability to funnel multiple database connections over a single network connection thereby saving OS resources. See more details in the Net Services Admin guide.
As seen earlier, SQL statements processing involves a number of roundtrips between a database client i.e., Java mid-tier/web-server and the RDBMS. If you move the Java code close or into the RDBMS session/process, you cut the network traffic which constitutes a large part of the latency.
Okay, stored procedures are old-fashion, so seventies, but modern data processing such as Hadoop or Spark, collocate the processing and data for low latency.
If your goal is efficiency, you got to consider using Java stored procedures, here and there for data-bound modules. I discussed the pros and cons of stored procedures in chapter 1 of my book. I’d add that in a modern micro-services based architecture, REST-wrapped stored procedures are a good design choice for data-bound services.
Scaling Out Java Workloads
In the second part of this blog post, I will discuss the various mechanisms for scaling out Java workloads including Sharded and Multitenant databases, database proxy, and the asynchronous Java database access API proposal.
Horizontal Scaling of Java applications with Sharded Databases
Sharded databases — horizontal partitioning of tables across several databases — have been around for a while. Java applications that use sharded databases must:
(i) define which fields to use as sharding key (and super sharding key)
(ii) set the values and build the key then request a connection to the datasource.
Java SE 9 furnishes the standard APIs for building the sharding and super-sharding keys.
Without further optimization, all shard-aware connection requests go to a central mechanism which maintains the map or topology of the shard keys thereby incurring one additional hop per request.
The Oracle Universal Connection Pool (UCP) has been enhanced to transparently collect all the keys that map to a specific shard. Once UCP gets the keys range it directs connection requests to the appropriate shard, based on the shard key.
Scaling Multi-Tenant Java Applications
Fully multi-tenant Java applications must use a multi-tenant RDBMS where a group of tenants or each tenant has its own database (it’s own pluggable database or PDB in Oracle’s parlance). With tens of thousands of tenants, using (tens of) thousands of databases (or PDBs), a naive approach would allocate a pool per database; we have witnessed naive architecture with a connection per tenant.
The Oracle UCP has been enhanced to use a single pool for all (tens of) thousands databases. Upon a connection request to a specific database, if there is no free/available connection attached to that database, UCP transparently repurposes an idle connection in the pool, which was attached to another database to be re-attached to this one, thereby allowing to use a small set of pooled connections to service all tenants.
See the UCP doc for more details on using one datasource per tenant or a single datasource for all tenants.
Proxies are man-in-the-middle software running between the database and its clients e.g., Java applications. There are several proxy offerings on the market; to name a few: MySQL Router, the Oracle Database Connection Manager in Traffic Director Mode (CMAN-TDM), ProxySQL, and so on.
The Oracle CMAN-TDM is new in Oracle database 18c; it is an extension of the existing Oracle Connection Manager a.k.a. CMAN and furnishes these new following capabilities
- Fully transparent to applications
- Routes database traffic to right instance (planned)
- Hides database planned and unplanned outages to support zero application downtime
- Optimizes database session usage and application performance
- Enhances database security
CMAN-TDM is client agnostic, iow, it supports all database clients applications including: Java, C, C++, DotNET, Node.js, Python, Ruby, R.
Java applications would connect to CMAN-TDM which, in its turn, connects to the database using the latest driver and libraries then transparently furnish the Quality of Service that the application would get only if it was using the latest driver and APIs.
See more details in the CMAN landing page and the Net Services documentations linked from the landing page.
The Asynchronous Database Access API (ADBA)
The existing JDBC API leads to blocked threads, threads scheduling, and contention; it is not suitable for reactive applications or high throughput and large-scale deployments. There are 3rd party asynchronous Java database access libraries but the Java community needs a standard API where user threads submit database operations and return.
The new API proposal based on the java.util.concurrent.CompletionStage interface; it is available for download from the OpenJDK sandbox @ http://tinyurl.com/java-async-db. The API implementation takes care of executing the operations, completing the CompletableFutures.
You can sense the ADBA API through the latest presentation and examples.
ADBA over JDBC (AoJ)
In order to help the community get a feel of ADBA, a trial/functional version (no asynchronous behavior) of it that runs over JDBC — that we are calling AoJ for ADBA over JDBC — @ https://github.com/oracle/oracle-db-examples/tree/master/java/AoJ. I encourage the reader to play with the AoJ examples.
With the announce of the project Loom which will bring Fibers and Java continuations to the JVM, we will once again revisit the performance and scalability of Java applications that use RDBMSes.
Originally published at db360.blogspot.com on March 29th, 2018.