Embarrassingly poor performance of regular point sets with std::unordered_set

Published in

Plaxis

6 min readApr 3, 2017

On the topic of choosing a set-like container, most experienced C++ developers will tell you: “Use std::unordered_set instead of std::set. It’s more efficient. Certainly for large data sets.”

If they are friendly enough to offer you further advice, they will also tell you about the importance of picking a good hashing function and that computing quality hashes can be expensive. Also, a custom equality operator, as required by std::unordered_set, may increase the cost of using a hash table. Don’t assume that your data set is large enough to make the hash table more efficient than a binary search on sorted data (implemented by std::set); always measure.

Knowing all this, I decided to compare the performance of my existing code based on std::set to an implementation based on std::unordered_set. Much to my surprise the performance of the new code was embarrassingly poor. For larger data sets the program became so slow that it would seem to hang. Did I make some trivial mistake? Had I not understood the advice? Here’s the story.

The objects I was hashing were points in three-dimensional space, which are typically represented by a set of three (Cartesian) coordinates, let’s say
using Point3D = std::array<double, 3>;.
If you want to insert such a point in a hashmap, you need to provide a hashing function. This function should generate a hash (a value of size_t) for every point. Note that the number of different points is far greater than the number of values that size_t can take, so it is impossible that every point has a unique hash. However, a good hashing function will show no dependencies (predictable outcomes) between the bits representing the point (input) and the bits representing the hash (output). From this requirement it follows that points with similar coordinates but in different order, let’s say (1.0, 2.0, 3.0) and (3.0, 2.0, 1.0), should have different hashes.

Knowing this I decided to use boost::hash_range(), which is designed to produce (almost surely) different hashes for coordinate-triples that are identical except for the ordering of the numbers:

const auto hash = boost::hash_range(begin(coordinate_array), 
                                    end(coordinate_array));

The points I was hashing were originating from the lattice points of a point-region octree. To get a visual impression of the kind of point set we are dealing with, look at the following quadtree (equivalent of the octree in 2D) decomposition of space with its lattice nodes marked in red:

The octree lattice points are very similar to the points in the example except that they are spaced regularly in 3D. If the bounding values of the x-coordinate are x_min, x_max, then for all lattice points the value of the x-coordinate follows the pattern

p_x = x_min + (j / 2^i) * (x_max — x_min)

with j in [0, 2^i]. The same holds for the y- and z-coordinates, though with their own independent bounds. Building upon this relation, the issue of std::unordered_set being much slower than std::set could be independently reproduced.

Here is the code that we use to randomly generate regular points:

#include <random>
std::random_device rd;
std::mt19937 gen(rd());
std::uniform_int_distribution<> intDis1(-1000, 1000);
std::uniform_int_distribution<> intDis2(0, 16);
using Point3D = std::array<double, 3>;auto genRegularCoordinate = [&gen, &intDis1, &intDis2]()
  {
    return double(intDis1(gen)) +
           double(intDis1(gen)) / pow(2.0, intDis2(gen));
  };auto genRegularPoint = [&]()
  {
    return Point3D{genRegularCoordinate(), genRegularCoordinate(), 
                   genRegularCoordinate()};
  };

Three examples of regular points:

(-850.98974609375, 756.984680175781, -175.439453125)
(-787.3984375, -1024, -314.46875)
(-593.375, -920.15625, -161.5859375)

As input for the efficiency test we build a vector of 100000 points:

auto inputPoints = std::vector<Point3D>{};
for (int n = 0; n < 100000; ++n)
{
  inputPoints.emplace_back(genRegularPoint());
}

To test the efficiency we fill the set with these points and count the number of points from the vector that are in the set:

auto pointSet = std::set<Point3D>(begin(inputPoints), 
                                  end(inputPoints));
//auto pointSet = std::unordered_set<Point3D>(begin(inputPoints), 
//                                            end(inputPoints));auto nFound = count_if(begin(inputPoints), end(inputPoints),
  [&pointSet](const Point3D &point)
  {
    return pointSet.count(point) > 0;
  });

On my machine this takes about 0.15 seconds for std::set and 30 seconds for std::unordered_set. If I double the number of points to 200000 then this becomes 0.30 seconds versus 150 seconds.

So what is wrong then? The first thing I checked was whether the hashes are unique. This turns out to be the case. I printed some statistics of the std::unordered_set instance:

bucket_count = 131072
max_bucket_count = 131072
load_factor = 0.762939
max_load_factor = 1

Again, nothing out of the ordinary. The load_factor follows straightaway from the number of elements (100000) divided by the bucket_count. The statistic that is missing, is something that tells us how well those 100000 elements are distributed over the 131072 buckets…

As a final investigative step I started to loop over all buckets and print their contents. Much to my surprise it turned out that all 100000 points, without exception, were inside the same bucket. But how is that possible? To understand this we need some basic knowledge about how a typical hash map is implemented.

With size_t potential values for the hashes, it is impossible to link every hash to a separate memory address. Instead, the hash map creates a number of buckets and maps hashes to buckets. All inserted objects are stored in the bucket to which their hash is mapped. In our case we have 131072 buckets, which is 2¹⁷. The obvious way to create this mapping is dividing by the number of buckets and using the remainder as the bucket number:

bucket_idx = hash % bucket_count;

With all our different hashes ending up in the same bucket, we now know that they must have the same remainder after division. Since in our case the number of buckets is a power of two, the remainder is simply as many of the least significant bits as the power. For example:

731 % 2⁵ = 27

is in binary

1011011011 % 100000 = 11011

This means that in our case the last 17 bits of our hashes must be identical! Printing the hashes in binary indeed confirms this:

6224711552622665454 ==
  0101011001100010100111100010110101011011010110000001111011101110
5118867174346727150 == 
  0100011100001001110111101001001011111011010110000001111011101110
3439893115510988526 == 
  0010111110111100111101001001000000101111010110000001111011101110
12686064959077490414 == 
  1011000000001101111101001101110111011011010110000001111011101110
11837512432766951150 == 
  1010010001000111010010101101011011111011010110000001111011101110

From this small sample it appears that even the last 24 bits are identical.

We now finally understand why our hash map is so slow: the quality of our hashes. A simple solution I found is to hash the resulting hash, like this:

const auto hash = boost::hash_range(begin(coordinate_array), 
                                    end(coordinate_array));
const auto betterHash = std::hash<size_t>()(hash);

Rerunning the experiments with this change confirms that std::unordered_set has the performance we would expect. Comparing the timings for std::set, std::unordered_set with the original hash, and std::unordered_set with the extra hash, we see that for sufficiently large number of elements, std::unordered_set actually becomes faster than std::set:

Another solution would be not to use a power of two for the number of buckets. Unfortunately, with MSVC++ 12.0 we cannot do this. If we request a different size, then bucket_count always takes the value of the nearest higher power of two. I also tested with libstdc++ of GCC version 4.9.2., which uses prime numbers for the amount of buckets. My experiments confirmed that this strategy for picking a number of buckets effectively avoids this issue.

Note that what we have discussed here holds for std::unordered_map as well as any other container based on a hash map.

Plaxis software makes it easy to model complex geo-engineering projects. Visit our website for career opportunities.

Embarrassingly poor performance of regular point sets with std::unordered_set

Written by Matthijs Sypkens Smit