Bag — The Counter

https://www.eclipse.org/collections/

I have often times encountered the necessity to count the number of objects. I have experienced the necessity to count in two flavors, first is to count the number of objects which satisfy a certain criteria and second is to find the number of times a particular object is encountered. In this blog we are going to see how to solve the second problem: Find the number of times a particular object is encountered.

Bag (or Multiset):

Bag is a data structure which you use when you are counting objects by putting in a Map<K, Integer> . A Bag is similar to your shopping bag or grocery bag, wherein you can have one or more occurrences or a particular item in no particular order. So, a Bag is an order independent data structure like a Set, however it allows duplicates.

Let us consider a list of items and you want to count the number of each fruit you have in your list. You can simply group the items and count, JDK has Collectors which do that for you. The code looks like this:

Using a Map to count

“Apple”, “Banana” and “Orange” have a valid count, however, “Grapes” which are not a part of the items the assertion has to be for a null. There by making this implementation not null safe.

Now let us solve the same problem by using an Eclipse Collections Bag in this case. Eclipse Collections has the toBag() API available which returns a Bag.

Using a Bag to count

Bag has occurrencesOf() API on it which returns the count. The occurrencesOf() API is null safe as can be seen by the assertion for “Grapes”.

In addition to the rich API available on RichIterable , the Eclipse Collections Bag also has more specific and intuitive API like occurrencesOf(), addOccurrences(), topOccurrences(), bottomOccurrences() to name a few.

The Eclipse Collections Bag implementation is called HashBag. A HashBag is backed by an ObjectIntMap<K> from Eclipse Collections itself. The ObjectIntMap is an open address map which has Objects as a Key but the values are primitive ints. This implementation makes the Bag leaner.

Below are a few memory and performance comparisons between JDK 1.8 HashMap and Eclipse Collections 9.2.0 Bag

Memory Footprint (lower number the better)

This shows the total memory footprint including the constituents of the data structures.

Memory Comparison HashMap<Integer, Integer> vs Eclipse Collections HashBag<Integer>
Memory Comparison HashMap<String, Integer> vs Eclipse Collections HashBag<String>

Performance Tests (higher number the better)

All measurements reported in operations/s.

Source code for memory tests and performance tests is available on GitHub.

Note: Map<K, Integer> is considered for memory and performance tests instead of Map<K, Long> so that the comparisons are comparable since the Eclipse Collections Bag is backed by an ObjectIntMap<K>. I have verified that the memory footprint for Map<K, Integer> and Map<K, Long> for these tests was same.

Summary:

  1. Eclipse Collections HashBag has ~40% smaller memory footprint compared to JDK HashMap.
  2. JDK HashMap performs better for than Eclipse Collections HashBag for add() and look-up operations for sizes less than 40,000 elements.
  3. JDK HashMap and Eclipse Collections HashBag have comparable performance for sizes greater than 40,000 elements.
  4. Eclipse Collections HashBag performs better than JDK HashMap when adding the same element 10 times.
  5. JDK HashMap performs slightly better than Eclipse Collections HashBag for look-up operations.
  6. Eclipse Collections HashBag has API which is helpful for Bag (count) specific operations.
  7. Eclipse Collections HashBag is null safe for cases where a particular object does not exist in the Bag.

Show your support star us on GitHub.

Eclipse Collections Resources:
Eclipse Collections comes with it’s own implementations of List, Set and Map. It also has additional data structures like Multimap, Bag and an entire Primitive Collections hierarchy. Each of our collections have a rich API for commonly required iteration patterns.

  1. Website
  2. Source code on GitHub
  3. Contribution Guide
  4. Reference Guide