Query Plan Optimization: Join Predicate Ordering
After the improvement of Hyrise’s execution engine that was presented in our last article, this post will again deal with a query plan optimization. This will be the last post for our master’s project. However, there will be a new and fresh master’s project in April 2020, continuing our work on Hyrise Medium.
Join Predicate Ordering
This optimization deals with the reordering of predicates in a multi-predicate join. We will discuss why the order matters and how we can optimize it.
A Closer Look at Query 93
The SQL query for this can be found here.
With a runtime of 2.98 seconds, TPC-DS query 93 was one of the slowest queries. By looking at the physical query plan (PQP) the performance bottleneck became evident - a single hash join was responsible for more than 90% of the runtime.
Figure 1 shows a simplified excerpt from the PQP. The hash join joins two tables on two predicates. In Hyrise, such a multi-predicate hash join is executed in two phases. In the first phase, only the columns referenced in the first predicate are hashed and matched. In the second phase, the intermediate result is filtered using the remaining predicates. An alternative would be to calculate a surrogate key for all columns referenced in any of the predicates and use these keys for hashing. However, this would be more expensive. Let’s look at a concrete, simplified example to better understand the implications of Hyrise’s approach.
Figure 2 shows tables a and b filled with some example values and the result of a join on the first predicate. This intermediate result has four rows, so the other predicate [a.y = b.y] has to be evaluated four times. The final result is empty because no row satisfies the second predicate. If we joined the second predicate in the first step instead, our intermediate result would already have been empty. Therefore, it is not necessary to evaluate the first predicate at all.
The Impact of the Join Predicate Ordering
We can generalize our findings in the following strategy: Sort the predicates by ascending join selectivity to minimize the number of evaluated predicates in a multi-predicate join. The join selectivity of a join predicate is the ratio between the number of qualifying tuples and the number of tuples in the Cartesian product. Thus, when joining on a predicate with a low join selectivity, the result will be small. In our case, joining on the predicate with the smallest join selectivity ensures that the intermediate result, to which we will apply all other predicates, will be as small as possible.
However, to actually use this insight we still need to get the join selectivity for each predicate. Fortunately, an estimation is enough because all that matters is the ordering. Additionally, in case two, predicates have a similar join selectivity their order may be switched without changing the performance much. To achieve the optimal order we can use the same join cardinality estimations that are already used by Hyrise’s join ordering optimization and sort according to these estimations.
Evaluation
For TPC-DS query 93, the estimated cardinality for the predicate [ss_ticket_number = sr_ticket_number] was 46 million rows, for
[ss_item_sk = sr_item_sk] just 3 million rows. By switching the order of these predicates we could reduce the runtime from 2.98 seconds to 0.42 seconds, a performance increase by a factor of 7.1. An inefficient order of predicates in a multi-predicate join can only be found in TPC-DS query 93. Therefore, the Join Predicate Ordering has no significant impact on any of the other queries of this benchmark.
Overall Performance Impact
When we combine the four optimizations done in this master’s project — the Sub-Plan Memoization, the Disjunction Split-Up, the Join Predicate Ordering, and the Validate Parallelization, we achieve a 75% (geometric mean) performance increase for TPC-DS. Figure 3 depicts how these optimizations changed the execution time of 26 TPC-DS queries with the most change in performance.
Specifications of the machine on which the benchmarks were executed:
4x Intel(R) Xeon(R) CPU E7–4880 v2 2.50GHz; 15 cores per CPU (2 hardware threads); 2.2 TB main memory; 32 KB L1, 256 KB L2-cache per core, 38400 KB LLC.
Summary
Even though we wanted to focus on performance improvements, our master’s project first had to extend Hyrise to support TPC-DS. To do so, we integrated the TPC-DS table generator into Hyrise and setup a verification process. When we began working on the project, only 18 TPC-DS queries could be executed by Hyrise. By implementing the SQL features WITH and STDDEV_SAMP and extending the alias functionality, Hyrise now supports 38 TPC-DS queries. At this point, we started looking for performance bottlenecks in long running queries and how to fix them. One of these optimizations was described in this article, but please make sure to check out our previous articles to learn more about the other TPC-DS optimizations we found. The appendix shows the overall performance impact of our master’s project on Hyrise’s TPC-DS performance.