UnifiedSet — The Memory Saver

https://www.eclipse.org/collections/

In my previous blog I explained how Eclipse Collections UnifiedMap works. In this blog, we will see how UnifiedSet in Eclipse Collections works.

Unified Set

UnifiedSet is the Set implementation of Eclipse Collections and is implemented very differently than a JDK HashSet. UnifiedSet is based on principles similar to UnifiedMap.

A JDK java.util.HashSet is backed by a java.util.HashMap. The backing HashMap has a dummy value which is associated with an object in the backing Map: HashMap<E,Object> map ,Object PRESENT = new Object()
This design leads to HashSet inheriting the behavior of the HashMap. A HashMap is backed by a table of Entry objects. The Entry implementation has hashcode, key, value, next as members, HashMap essentially caches the hashcode of keys. Moreover, due to the dummy value, HashSet ends up using more memory than required.

UnifiedSet, on the other hand is implemented as a Set i.e. it does not have any “dummy” empty value objects, it is not backed by table of Entry objects.
UnifiedSet is backed by a flattened array. Each object occupies a slot in the backing array. The flattened array stores only the required objects there by creating a leaner implementation. Having objects in a flattened array, enhances the performance for iterations as well. Collisions in the main array are handled by putting a special object called ChainedBucket. The ChainedBucket is another array where the colliding elements are stored.

Look up patterns like contains in the UnifiedSet use a standard hashcode index algorithm to find the location of the element. If the element at the index is not a ChainedBucket then simply the existence of the element is examined. If the element at the index is a ChainedBucket then the backing array is evaluated linearly to find the required element.

Since UnifiedSet does not cache the hashcode, for each look up, hashcode needs to be computed. So, the performance of UnifiedSet is directly dependent on the hashcode implementation of the object in the UnifiedSet.

Below are a few memory and performance comparisons between JDK 1.8 HashSet and Eclipse Collections 9.2.0 UnifiedSet.

Memory Footprint (lower number the better)

Memory Comparison HashSet<Integer> vs UnifiedSet<Integer>
Memory Comparison HashSet<String> vs UnifiedSet<String>

Performance Tests (higher number the better)

Source code for memory tests and performance tests is available on GitHub.

Summary:

  1. Eclipse Collections UnifiedSet has ~75% smaller memory footprint compared to JDK HashSet.
  2. JDK HashSet performs slightly better than Eclipse Collections UnifiedSet for add() Operation.
  3. Performance for JDK HashSet and Eclipse Collections UnifiedSet is similar for contains() Operation.
  4. Eclipse Collections UnifiedSet performs better than JDK HashSet for forEach() Operation.

Eclipse Collections Resources:
Eclipse Collections comes with it’s own implementations of List, Set and Map. It also has additional data structures like Multimap, Bag and an entire Primitive Collections hierarchy. Each of our collections have a rich API for commonly required iteration patterns.

  1. Website
  2. Source code on GitHub
  3. Contribution Guide
  4. Reference Guide

Show your support star us on GitHub.