Choosing the right JDBC Connection Pool…

Harsimar Singh
Javarevisited
Published in
7 min readOct 20, 2021

Improve the performance of your application without really changing anything.

Why do we need a connection pool?

Source

Creating a connection with the database system is an expensive and time-consuming operation when you are running a platform that experiences a large scale of users building and clearing DB connections over and over is simply not feasible. To understand why creating a connection is so expensive, let look at what steps are involved:

  1. Opening connection to the database
  2. Authentication of the user before establishing the connection
  3. Creating a TCP socket for reading/writing data
  4. Sending/receiving data over the socket
  5. Closing the connection
  6. Closing the TCP socket

A connection pool is used to solve the exact problem, it creates a pool of reusable connections ahead of time to improve the application performance. Whenever the application starts a pool of Connections is created with the database. These connections are managed by a Pool Connection Manager. The latter is responsible for managing the lifecycle of a connection.

Source

Whenever the client requests a connection, the Connection Pool Manager takes out a connection from the pool of available connections and return that direct link to the client. The Pool Manager also keeps listening to all the events on the active connections, for any close event it performs a pseudo-close where it takes the connection and puts it back in the pool.

JDBC Connection pooling frameworks

There are a number of connection pooling frameworks available for enterprise use. This also means added confusion as to which connection pooling framework to choose for your application.

TL;DR: If you don't want to invest a lot of time to figure out which connection pool works best for your service you can go with HikariCP, it offers the best balance of features and claims to be the best in performance in several benchmarks.

What do you need from a good connection pooling framework?

Reliability

  1. Easy to configure.
  2. Look out for open bugs in the library.
  3. Special lookout for deadlock issues, some connection pools when not configured correctly result in deadlocks.

Performance

  1. There will be tons of test results online but one should pay extra attention to the settings and the test environment.
  2. The test results are highly dependent on the configuration settings, what works for them might not work for your service.

Community & Support

  1. Well documented.
  2. Large active community base.
  3. Widely use.

Although there are a lot of frameworks available to choose from like C3P0, Apache DBCP, BoneCp, Vibur etc. However, the most popular choices are Tomcat JDBC & HikariCP.

HikariCP 🚀

Hikari (pronounced Hi-Kaa-lee) translates to “Light”, named after its ultra-light nature (130kb only) it was developed by Brett Wooldridge. Interesting read on how Brett Wooldridge wrote the “fastest connection pool”. Hikari is the general go-to choice in most of the companies these days, its website claims it to be better than other frameworks in several tests. Their website offers a lot of test results and different pool analyses. Here are some pros of HikariCP:

  1. Designed to be deadlock-free
  2. Can detect connection leak by itself.
  3. Provides good defaults for all configuration properties. Yes, this is a big plus and the same cant be said for other frameworks.
  4. Finds the perfect balance between not overwhelming users with a lot of configuration to a lot of functional configuration.
  5. Ultra light-weight (130Kb only)
  6. Great benchmark results.
  7. Fewer bugs

Why is it so fast?

JMH Microbench results: source HikariCP
  1. Bytecode level optimizations: Compiled bytecode has a minimum footprint which means the CPU can load more program cache
  2. A lot of micro-optimizations: The library has reduced a lot of code, for example, the statement proxy has only 100 lines of code
  3. Collection framework: Wise use of collections framework to use faster implementations of things wherever possible. For example FastList instead of ArrayList.
  4. Custom implementations: ConcurrentBag, a custom collection type for concurrent read/write provides a high degree of concurrency, extremely low latency and minimized occurrences of false-sharing.

Selecting the right pool size?

people claim achieving 10–20X performance improvement by tweaking the pool size alone!

There is no straight answer as to what value you should set for your application connection pool, however, there are some fundamentals you should know before you start fiddling with it. The main idea is you should have enough connections in order to use all the available resources, but not more than that.

Let's say there are 1000 concurrent transactions happening at any moment at your backend system, what should be the value of the connection pool to get the best response time? 1000 right, that way each transaction can get a connection and you can serve all the requests in minimum time, right? NO!

Photo by John Anvik on Unsplash

To understand this we have to revisit the basics once again, whenever a computer CPU is dealing with multiple threads the actual execution is still sequential (assuming 1 core machine) the multi-execution is in fact due to threads scheduling, the OS switches context and gives code for another thread to the CPU to process and the cycle continues. The more context switching is there the more it will impact the overall performance. To generalise this if you have N+1 threads running on an N core machine, you are basically losing performance and not gaining it! [This is over-simplification in actual scenarios it's not usually the case, for reasons like thread waiting for some resource etc]

So for N core machine pool size should be N, right?

No, we cannot ignore Disk & Network. The data on any database system is stored physically on the disk (SSD mostly these days). To write the data physically there is a cost in time due to various activities that are performed internally (the disk doesn't store data just anywhere for every insert it has to seek to right index and then write). This is called I/O wait time, and during this time the thread is basically blocked and waiting for the I/O to occur, so if we had configured more threads than the machine cores, we would have got more work done. The same concept applies to Networks also, if you have choked your network bandwidth it's going to cause a block for your application threads.

The Formula

There is no direct formula, however, this might give you some numbers to start with:

connections = ((2 * core_count) + no_of_disks)

So, if you have a 4-core machine with 1 hard disk this means you can start with a value of 9 and then play around to see what works best for your application. Please do note that your pool size requirement might not follow this formula, for services with a lot of I/O blocking the starting value would be much higher than 9, this is to get the idea that large pool values should be avoided.

Note: SSDs are used predominantly nowadays and while there is no formula available for SSDs, you should know the number will always be less than the no arrived from the above formula. Why? SSDs are faster — meaning faster seek/read/write time, hence less time for thread blocking

One hack to check if one should reduce the no of connections is to check pg_stat_activity if most of your connections are spending more time IDLE, then that's the clue that one should reduce the pool size.

While the idea is to keep the connections to the minimum, the general pool value to start with on a production server is 100 and then change it as per the testings

Note: The pooling frameworks have min and max connections and your actual connection at any point varies from min to max value set, its not a single value

Conclusion

Creating a connection to the database is expensive can impact the performance of your application. The Connection Pools can help with this by preparing connections ahead of time, a connection is kept alive and can be reused again and again in a connection pool. There are multiple JDBC frameworks for connection pooling the most popular choices being Tomcat JDBC and HikariCP. Whatever framework you choose it's important to configure the properties correctly, (a big plus of HikariCP is that it offers good defaults for optional configs). When deciding on the size of the connection pool, once you have moved beyond the point where the server is busy all the time, adding more connections will actually reduce the performance because the all the overhead in context switching.

If you need any more help feel free to connect with me. Do follow for all the new updates. Thanks for the read. Cheers 🍺

--

--

Harsimar Singh
Javarevisited

all things tech | Engineer @Uber Ex Hotstar, Myntra