StarRocks millisecond-level point queries on hundreds of billions of data

Kaisen Kang
6 min readJun 17, 2023

--

Supporting point queries with arbitrary field filtering is a common requirement for TP databases, and a common solution is the use of secondary indexes. In recent user scenario tests, we found that StarRocks, in a cluster consisting of 3 servers with 8-core CPUs, 32GB of memory, and 4TB AWS EBS disks, can provide sub-millisecond performance for hot queries on hundreds of billions of data and around 1-second performance for cold queries.

This article briefly analyzes how StarRocks achieves millisecond-level query of tens of billions of data. The main method of StarRocks to accelerate the query of prefix columns is the prefix index, and the main method of accelerating the query of the following non-primary key columns is Bitmap index:

select * from table where A = 1
select * from table where A = 1 and B = 2
select * from table where A = 1 or B = 2

1 How does StarRocks Bitmap index speed up point queries

1.1 StarRocks supports indexed Bitmap indexes

StarRocks Segment File

The segment file of StarRocks is divided into three parts: data, index, and metadata. Data and index are organized according to Page, and the default size of a Page is 1M.

StarRocks Bitmap Index Rationale

The Bitmap Index of StarRocks mainly includes two parts: the dictionary and the Bitmap index itself. The dictionary saves the mapping from the original value to the encoding Id, and the Bitmap index records the mapping from each encoding ID to the Bitmap.

StarRocks Bitmap Index Storage Format

As shown in the figure above, both the dictionary part and the Bitmap index part of StarRocks are stored in the format of Page. In order to reduce memory usage and speed up indexing, StarRocks indexes both the dictionary and Bitmap Page.

Of course, if the cardinality of the Bitmap index column is very low and there is only one Dict Data Page and Bitmap Data Page, we don’t need Dict Index Page and Bitmap Index Page.

1.2 Operation On Encode Data

When StarRocks uses Bitmap Index for filtering, it only needs to load the Dict Index Page and some Dict Data Pages first, and quickly filter according to the dictionary values, without decoding data, and without loading all Bitmap Data Pages at once.

To sum up, since StarRocks supports Bitmap index indexing and supports quick filtering according to dictionary values, even if the cardinality of the Bitmap Index column is high, the overall disk storage of the Bitmap Index is large, the memory usage is still small.

1.3 Or predicate supports Bitmap index

Due to historical reasons, the predicate pushdown of the StarRocks storage layer only supports the CNF conjunctive paradigm, which requires the predicate to be connected by And. For example for the following SQL:

select * from table where A = 1 and B = 2

StarRocks will push down the predicates A = 1 and B = 2 respectively, and the final result is the filtered intersection of the two predicates. A = 1 and B = 2 can use various indexes on column A and column B respectively.

But for the following Or predicate:

select * from table where A = 1 or B = 2

The old version of the StarRocks storage layer cannot push down the OR predicate, so it needs to scan the entire table and then use the A = 1 or B =2 predicate to filter data. In order for the Or predicate to use the Bitmap index, the most ideal way is that the storage layer supports the Or predicate. A = 1 and B = 2 respectively apply the Bitmap index, and then the row numbers can be union, but the code modification is a little complicated, so it is implemented two steps, the first step is to rewrite SQL on the FE side so that the OR predicate can use the Bitmap index, and then the second step will directly support the OR predicate in the storage layer.

The SQL rewrite is as follows:

scan or to union

1.4 Bitmap index memory cache

StarRocks will ensure that all Dict Index Pages and Bitmap Index Pages must be in memory, and keep as many Dict Data Pages and Bitmap Data Pages in memory as possible, and ensure that the Page Cache related to Bitmap Index is not flushed by Column data.

In this way, in the case of a small query result set, even if it is the first Cold Query, StarRocks can achieve only one Disk Seek.

1.5 Adaptive Bitmap Indexing

When StarRocks found that the selectivity of the Bitmap index was not high and needed to seek a lot of data pages, we would give up using the Bitmap index and directly scan in order.

2 StarRocks prefix index speeds up point queries

As shown in the figure above, if the cardinality of a table’s filter condition is very high, in StarRocks, there is no need to use the Bitmap index, set the column as the Sort Key, and use the prefix index to filter. For high-cardinality columns, the prefix index has better point lookup performance and can save a lot of storage space.

3 StarRocks RollUp || Materialized View speeds up point queries

When the filtering degree of the Bitmap index is not large and many data pages need to be Seek, you can consider further exchanging space for time, using RollUp and Materialized View of StarRocks to set different filter columns as Sort Key, and using the prefix index to speed up point queries.

4 StarRocks Generated Column speeds up point queries

As shown in the figure above, when the filtered column is a Key column of type Json or Map, in StarRocks, we can add a Generated Column for the Key column, and then create a Bitmap index for the Key column to speed up the point queries through the Bitmap index.

Of course, when StarRocks supports the direct build the Bitmap indexes for Json or Map type Key columns, there is no need to rely on Generated Column.

5 StarRocks Two-Level Partition & Bucket

StarRocks supports two-level partitioning and bucketing. You can first partition according to columns with time attributes, and then partition according to high-cardinality columns. This ensures good data locality and avoids data skew.

  • The better the data locality, the better the data compression ratio in the columnar storage, and the lower the IO cost of the query.
  • Without data skew, StarRocks can give full play to the capabilities of multi-machine and multi-core.

5.1 StarRocks Partition Pruning

As shown in the figure, if the filter condition of the query contains partition columns, StarRocks can trigger partition pruning, and the amount of Scan data will be greatly reduced.

5.2 StarRocks Bucket Pruning

As shown in the figure, if the filter condition of the query contains bucket columns, StarRocks can trigger bucket pruning, and the amount of Scan data will be greatly reduced.

6 Conclusion

SSB Benchmark
TPC-H Benchmark

When you need high-performance point query, high-performance OLAP query, high-performance AdHoc query and high-performance data lake analysis at the same time, StarRocks is your best choice.

--

--

Kaisen Kang

StarRocks Query Team Leader, StarRocks PMC, Apache Kylin PMC(Inactive), Apache Doris PMC (Retired) , https://www.bcmeng.com/