Why are Google BigQuery, Snowflake, Redshift and other cloud data warehouses slower than most expect? — Part 3
Now, it is time to discuss the juicy part of this blog series — How Apache Kylin works to bend the curve and make the exponential growth of data independent from cost & query performance.
If you have not read the previous blogs of this series, please go to the following links— Part 0, Part 2. ( yep, there is no Part 1🥁)
Precomputation Shrinks Big Data
In an MPP query engine, a typical query processing will go through 5 steps as illustrated below — data scanning, joining, filtering, aggregation, sorting.
Precomputation, simply put, is taking the heavy lifting work offline, including joining and aggregation. Those two steps are the most time-intensive and labour-intensive parts of query processing. At query runtime, rather than calculating original raw data on the fly, only a minimal portion of post data processing is expected on…