BloomFilter
We recently had a situation where we had to search a big list of 500 million hashes against a list of 40 million hashes. The 500M hashes were stored in flat, unsorted text files on 5 DVDs, so there was no easy way to search that list. The 40M hashes were stored in a MySQL database. Some benchmarking showed…