Non-blocking SQL in Scala

Grygoriy Gonchar
Oct 29, 2016 · 5 min read
http://www.reactivemanifesto.org/

To be reactive according to The Reactive Manifesto you have to be Responsive, Resilient, Elastic and Message Driven. The last criteria in this list caused big movement into asynchronous way of communications. This includes asynchronous RPC and messaging libraries, database drivers and more. RDBMs are quite powerful and useful. The official instrument for database access on JVM provided by database vendors are drivers implementing JDBC API. This is true for Scala world as well. But JDBC is designed to be blocking and consumes thread per database call. You will not find in API itself methods or interfaces allowing you to get query result in another thread and don’t wait for database response. The question is what shall we do when we are building application on top of SQL database with reactivity in mind.

Threads overhead

Non-blocking communication allows recipients to only consume resources while active, leading to less system overhead.- The Reactive Manifesto

CPU can switch to other thread while some thread is waiting for database response. This is definitely true and scales to some extend. But after some numbers overhead from each thread starts hurting. The more threads you have — less performant you become. On the other side there is a database suffering from similar issue with number of connections. Big number of connections doesn’t help to scale for the same reason as big number of threads. Check this awesome explanation why. The rule of thumb is to keep number of threads and number of connections small on either side. But if we do so and keep doing blocking IO inside threads this will lead to CPU being idle because all threads quickly become blocked. So blocking IO inside the thread doesn’t scale well.

Green threads approach

Another reason why we should not block is green thread approach. Green threads emulate multithreaded environments without being using OS threads directly. Starting a green thread is faster and cheaper then starting native OS thread. Switching from one green thread to another is much faster than switching between OS threads. Green threads under the hood rely on small number of OS threads. And if you block — you block not a green thread but underlying OS thread. This means you cannot benefit from green thread approach if you do a lot of blocking calls.

Fork/Join in Scala

In July 2011 Java SE 7 release introduced Fork/Join framework for concurrent execution of lightweight non-blocking tasks as a kind of green threads approach. In 2012 Scala brings Fork/Join on board together Future abstraction support. This changes rules concurrency in Scala. ForkJoinPool is now used as default ExecutionContext, blocking calls are recommended to be avoided, if not possible — wrapped in blocking{} construct and if there are too many of such — executed on another ExecutionContext backed by dedicated thread pool. As we know blocking calls don’t allow ForkJoinPool to let it do it’s magic (which is explained here by author of Fork/Join in Java) How this can change your performance I think one of the most impressive are results shared by Akka team.

Avoid blocking IO in global ExecutionContext

Avoid blocking IO in main ExecutionContext is high-throughput receipt for Scala in general and frameworks like Akka or Finagle. And main reasons for this in my opinion are:

  1. Blocking in threads doesn’t not scale to high numbers
  2. Blocking breaks Fork/Join magic

Non-blocking SQL drivers

The solution to avoid blocking IO is non-blocking IO. Non-blocking approach allows to release the thread while IO is in progress and execute callback when IO will be finished. This allows to reduce number of threads close to number of CPU cores and multiplex IO requests in fewer connections. And this is definitely not a new technology. J2SE 1.4 release introduced NIO as non-blocking IO enabler for JVM in yearly 2002.

To access SQL database on JVM you don’t necessary need blocking JDBC. This is the idea behind number of asynchronous database drivers for JVM which don’t follow JDBC spec. Some examples include:

https://github.com/mauricio/postgresql-async
https://github.com/finagle/roc

However you should note that none of these drivers is officially supported by database vendors yet. Also I was not able to find any benchmarks to compare these drivers with JDBC implementations in terms of performance.

“Non-blocking” JDBC

We know that JDBC API is blocking. But nobody prevents us from implementing this idea on top of JBDC. We could wrap database call into a Future with dedicated ExecutionContext for blocking calls. The threads amount allocated in ExecutionContext for blocking calls should be set equal to number of connections in connection pool. We don’t need more. This will allow to reduce overall number of threads and will let CPU serve non-blocking tasks in main ExecutionContext while waiting for database responses. Additional benefit of separate ExecutionContext for blocking calls is better resiliency and failure isolation. You get additional layer of protection against unexpected latencies then query or socket timeouts.

However there are difficulties with this approach. The main one is transaction management. In JDBC transactions are possible only within single java.sql.Connection. To make several operations in one transaction they have to share connection. If we want to make some calculations in between of them we have to keep the connection. This is not very effective since we keep limited number of connections idle while doing calculations in between. Another issue is that java.sql.Connection documentation does not define thread-safety requirement. Which means you should not compose your database operations in Futures and let them share the connection to run transactions unless your particular java.sql.Connection implementation usage is thread safe.

This idea of asynchronous JDBC wrapper is implemented in Slick 3 where the only available API is asynchronous with dedicated ExecutionContext for blocking calls. But nobody prevents you from using this approach on top of synchronous Scala JDBC wrappers and implement asynchrony by yourself.

Finally non-blocking JDBC may come to Java roadmap. As it was announced at JavaOne in September 2016 it is possible that we will see it in Java 10.

Design impact

Non-blockiness on database access level influence your design significantly. To be fully non-blocking you have to make whole execution cycle as a set of functional compositions around concurrent structures such as Futures. Changing existent blocking application to non-blocking database access might result in complete refactoring. You must be sure that this effort is really required and which option of non-blocking implementation you need.

Grygoriy Gonchar

Written by

Software Architect at eBay @ebaytechberlin Classifieds. Former Head of Architecture @Kreditech. Shares about architecture and leadership. Thoughts are my own

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade