Tuning indexing process in Manticore search

Denys Golotiuk
DataDenys
3 min readJul 25, 2022

--

Manticore is a fork of Sphinx Search, great solution for full text search. Indexing process can take a lot of time, especially when we have to deal with hundreds of millions of documents. Let’s find out which performance tuning options we have to make indexing process faster.

We’re going to edit config file which is usually available at /etc/manticoresearch/manticore.conf.

mem_limit

Most important indexer option is mem_limit. It limits amount of RAM available for indexer and is defaulted to 128M. Which is fairly small value for modern loads.

Let’s test default indexing performance based on the following plain index config (using Clickhouse as source):

source test_src {
type = tsvpipe
tsvpipe_command = clickhouse-client -q "SELECT crc64(text)-1, text FROM test FORMAT TSV"
tsvpipe_field = text
}
index test_idx {
type = plain
source = test_src
path = /var/lib/search/test
}

Indexer will do 118 reads and 3382 writes during the process:

root@desktop:/# sudo -u manticore indexer --all --rotate
...
indexing index 'test_idx'...
collected 1000000 docs, 861.6 MB
creating lookup: 1000.0 Kdocs, 100.0% done
sorted 127.2 Mhits, 100.0% done
total 1000000 docs, 861681456 bytes
total 65.635 sec, 13128252 bytes/sec, 15235.62 docs/sec
total 118 reads, 0.090 sec, 3416.2 kb/call avg, 0.7 msec/call avg
total 3382 writes, 1.039 sec, 439.5 kb/call avg, 0.3 msec/call avg
rotating indices: successfully sent SIGHUP to searchd (pid=303562).

If we increase buffer to 2G:

indexer {
mem_limit = 2G
}

We can see the difference in indexing process:

total 60.313 sec, 14284143 bytes/sec, 16580.07 docs/sec
total 4 reads, 0.135 sec, 91577.7 kb/call avg, 33.9 msec/call avg
total 2529 writes, 0.995 sec, 573.1 kb/call avg, 0.3 msec/call avg

That’s because larger buffer allowed indexer to make bigger read batches and fetch all source data in 4 instead of 118 reads. We also see indexing time improved, but that was tested on NVME disk and the difference is quite small. We would feel more difference in cases of HDD and larger indexing volumes.

write_buffer

This buffer is used for writing temp index files. Larger values will lead to less disk writes and we want exactly this when indexing large amounts of data. Since default value is only 1Mb we would want to see much bigger value here:

indexer {
...
write_buffer = 1G
}

Note, that indexer can use up to 4 writing buffers, which in our case will lead to maxium RAM consumption of 4 x 1Gb = 4Gb. So pick values 4 times smaller than you actually can dedicate for indexer writing buffer. And don’t forget that mem_limit amount of RAM will be used in addition to the writing buffers.

Summary

When indexing large amounts of data tune Manticore indexing options because defaults are irrelevant for modern hardware. These two options will make the biggest effect:

indexer {
mem_limit = 2G # will consume max 2G
write_buffer = 1G # will consume max 4*1 = 4G
}

This sample configuration will consume max 6G of RAM and will improve indexing performance compared to default settings.

--

--