Processing JDBC ResultSets with Java 8 Streams

cyclops-react offers a few options for processing JDBC data via an extended Java 8 Stream. The best approach will probably depend on your particular use case, and some of these approaches will also work with plain old unextended JDK Streams. However, in this article we will give you a few options. cyclops-react has two Stream types — ReactiveSeq and LazyFutureStream. Both have operators that can help you manage open resources, ReactiveSeq is single threaded in operation (although it can target different threads for execution) and LazyFutureStream is concurrent / parallel.

Discrete concurrent batches, single Stream

If you can split your queries into discrete batches, then the following Stream may be suitable. In it the queries will be executed concurrently and then processed once completed and saved on the same thread, you can control the concurrency level via the inputs into the LazyReact builder.

This approach of leveraging large numbers of concurrent discrete batches is the one we use for processing large amounts of data in files (locally, network storage or in the cloud) & it works very well for that use case. While some threads are idle during I/O phases are others make use of the CPU processing the incoming data. (There are a couple of examples of others getting some value out of this approach too https://github.com/aol/cyclops-react/issues/67 and https://github.com/aol/cyclops-react/issues/179).

This approach is likely to be less effective if leveraged using a standard (unextended) Java 8 Stream, at least because of the inherent limitations that apply to parallel Streams using the Common ForkJoin pool. By default the number of available threads is limited to the number of cores.

Multiple streams, each processing a contintuous Stream of results on a single thread

On the other hand if you would to execute your selects as a continuous Stream, running multiple ReactiveSeq streams independently may be a better fit. You need to figure out way to evenly shard your SQL ResultSets, and create (probably a lot) more ReactiveSeq Streams than CPU cores to optimize performance. This would also require writing an Iterator (or Spliterator) for Iterating over the ResultSet. Something like this (apologies for the formatting).

It is pretty straightforward to construct a JDK 8 Stream from an Iterator

Single Stream, continuous Stream of results, concurrent processing and updates

And the final alternative would be to mix the two approaches, and pass your jdbcResultSetIterator to the LazyReact instance

This is likely going to be less efficient than the other options as there is only a single thread performing I/O with the DB (but it’s not something I’ve tested to be 100% sure).
 
Alternatively to avoid having to create your own Iterator and map ResultSets to domain objects by hand, Speedment is a commercial tool that can generate a JDK 8 Stream of your Domain model via JDBC SQL query execution. That might be worth taking a quick look at too. You can always decorate the generated Stream from Speedment with either cyclops-react ReactiveSeq or LazyFutureStream.