Trinity Updates and integer codes benchmarks

Trinity got quite a few updates recently. From performance improvements, to new APIs and new features(check out the percolator query). Yesterday, Daniel Lemire announced Stream VByte, a codec he and Nathan Kurtz devised that yields impressive decoding speed, using SIMD instructions.

This is an integers encoding codec. They are used to compress arrays of integers into a special representation that can be decompressed later, and their efficiency is a function of the compression ratio and the encoding/decoding speed.

I have used Jeff Dean’s varint-GB, and Jeff Plaisance’s MaskedVByte as well as other codecs, including Lemire’s FastPFOR in the past in various projects, and Trinity’s Lucene codec implementation has been using FastPFOR (with support for streaming vbyte implemented and selectable via a macro), so when I read about it, I was very curious about the performance boost it could provide to Trinity.

Implementing support for it in the, default, Lucene Codec bundled with Trinity, was trivial, so it was easy to compare Stream VByte with the other codecs, in terms of index size, encoding and decoding speed.

The following figures are based on building and querying a 4M wikipedia pages index.

Encoding time is the time spent to build the posting lists; the Lucene codec was selected, and the time is mostly spent encoding integers, so it reflects the relative encoding speed of each. A single segment is created and accessed; multiple segments executed in parallel would result in faster query times, but the goal is to judge the performance of the integers codec, so a single segment is more appropriate here.

The query selected(multiple runs, average time is selected) is

THE | WEB | OF | 1 | 10 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | 28 | 29 | 30 | 31 | 32 | 33 | 34 | 35 | 36 | 37 | 38 | 39 | 40 | 41 | 42 | 43 | 44 | 45 | 46 | 47 | 48 | 49 | 50 | 51 | 52 | 53 | 54 | 55 | 56 | 57 | 58 | 59 | 60 | 61 | 62 | 63 | 64 | 65 | 66 | 67 | 68 | 69 | 70 | 71 | 72 | 73 | 74 | 75 | 76 | 77 | 78 | 79 | 80 | 81 | 82 | 83 | 84 | 85 | 86 | 87 | 88 | 89 | 90 | 91 | 92 | 93 | 94 | 95 | 96 | 97 | 98 | 99 | 100

This rather long query matches 1.4m documents, and accesses multiple postings list. The DocumentsOnly execution mode was selected, so that reported time is almost entirely specific to the engine postings list access performance.

FastPFOR

Encoding Time: 16.3s
Index Size: 273MBs
Average Query Time: 95ms

MaskedVByte

Encoding Time: 18.1s
Index Size: 290MBs
Average Query Time:117ms

Stream VByte

Encoding Time: 15.4s
Index Size: 329MBs
Average Query Time: 62ms

Conclusions

As you can see, Stream VByte is over 30% faster than the second fastest, FastPFOR in decoding, where it matters the most, and also the fastest among the 3 codecs in encoding(though not by much). On the flip side, the index generated is larger than the other two codecs, though not by much (17% or so larger than the smallest index generated when FastPFOR is selected).

This is quite an impressive improvement in terms of query execution time, which is almost entirely dominated by postings list access time (i.e integers decoding speed).