Bag — The Counter

Nikhil Nanivadekar
Jun 4, 2018 · 4 min read
https://www.eclipse.org/collections/

I have often times encountered the necessity to count the number of objects. I have experienced the necessity to count in two flavors, first is to count the number of objects which satisfy a certain criteria and second is to find the number of times a particular object is encountered. In this blog we are going to see how to solve the second problem: Find the number of times a particular object is encountered.

Bag (or Multiset):

Bag is a data structure which you use when you are counting objects by putting in a Map<K, Integer> . A Bag is similar to your shopping bag or grocery bag, wherein you can have one or more occurrences or a particular item in no particular order. So, a Bag is an order independent data structure like a Set, however it allows duplicates.

Let us consider a list of items and you want to count the number of each fruit you have in your list. You can simply group the items and count, JDK has Collectors which do that for you. The code looks like this:

Using a Map to count

“Apple”, “Banana” and “Orange” have a valid count, however, “Grapes” which are not a part of the items the assertion has to be for a null. There by making this implementation not null safe.

Now let us solve the same problem by using an Eclipse Collections Bag in this case. Eclipse Collections has the toBag() API available which returns a Bag.

Using a Bag to count

Bag has occurrencesOf() API on it which returns the count. The occurrencesOf() API is null safe as can be seen by the assertion for “Grapes”.

In addition to the rich API available on RichIterable , the Eclipse Collections Bag also has more specific and intuitive API like occurrencesOf(), addOccurrences(), topOccurrences(), bottomOccurrences() to name a few.

The Eclipse Collections Bag implementation is called HashBag. A HashBag is backed by an ObjectIntMap<K> from Eclipse Collections itself. The ObjectIntMap is an open address map which has Objects as a Key but the values are primitive ints. This implementation makes the Bag leaner.

Below are a few memory and performance comparisons between JDK 1.8 HashMap and Eclipse Collections 9.2.0 Bag

Memory Footprint (lower number the better)

This shows the total memory footprint including the constituents of the data structures.

Memory Comparison HashMap<Integer, Integer> vs Eclipse Collections HashBag<Integer>
Memory Comparison HashMap<String, Integer> vs Eclipse Collections HashBag<String>

Performance Tests (higher number the better)

All measurements reported in operations/s.

Source code for memory tests and performance tests is available on GitHub.

Note: Map<K, Integer> is considered for memory and performance tests instead of Map<K, Long> so that the comparisons are comparable since the Eclipse Collections Bag is backed by an ObjectIntMap<K>. I have verified that the memory footprint for Map<K, Integer> and Map<K, Long> for these tests was same.

Summary:

  1. Eclipse Collections HashBag has ~40% smaller memory footprint compared to JDK HashMap.

Show your support star us on GitHub.

Eclipse Collections Resources:
Eclipse Collections comes with it’s own implementations of List, Set and Map. It also has additional data structures like Multimap, Bag and an entire Primitive Collections hierarchy. Each of our collections have a rich API for commonly required iteration patterns.

  1. Website

Oracle Groundbreakers

Aggregation of articles from Oracle and partner’s engineers, Groundbreaker Ambassadors and the developer community on all things Oracle Cloud and its technologies. The views expressed are those of the authors and not necessarily those of Oracle. Contact @jimgris or @brhubart

Nikhil Nanivadekar

Written by

Lead Eclipse Collections: eclipse.org/collections, Java Champion. I enjoy hiking, skiing, reading. All opinions stated by me are my own. Twitter @nikhilnanivade

Oracle Groundbreakers

Aggregation of articles from Oracle and partner’s engineers, Groundbreaker Ambassadors and the developer community on all things Oracle Cloud and its technologies. The views expressed are those of the authors and not necessarily those of Oracle. Contact @jimgris or @brhubart

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade