The missing Java data structures no one ever told you about — Part 3

Donald Raab
Javarevisited
Published in
7 min readAug 24, 2021

Learn about primitive collection support in Eclipse Collections.

The evils of boxing in Java

Java has eight primitive types. They are boolean, byte, char, short, int, float, long and double. Each primitive type has a corresponding Object wrapper. They are Boolean, Byte, Character, Short, Integer, Float, Long and Double. Boxing is the process of taking a primitive value and storing it in an instance of its corresponding wrapper type. Autoboxing was a feature added in Java 5. Autoboxing made it possible to automatically wrap a primitive value in an instance of its corresponding wrapper class without requiring developers to explicitly write any code to perform the wrapping.

Primitive values, wrapper types, autoboxing and explict boxing

For more information on autoboxing and unboxing, please refer to the Java tutorial section on Autoboxing and Unboxing.

“Measuring Object Sizes in the JVM” is a good article and introduction to using Java Object Layout which is an OpenJDK Code Tools project.

Why is boxing evil?

Boxing in Java is almost pure waste. The thing developers care about most often is the primitive value in the wrapper, not the wrapper itself. The wrapper exists primarily so developers can use the primitive value in collections. The convenience of autoboxing is evil because it enables the developer to silently generate wrapper garbage that can have adverse impacts on the performance of their applications.

Thank you, no hidden boxes, please!

Eclipse Collections helps prevent developers from creating hidden boxes in the Java heap, by giving them complete control via a full complement of primitive collections. Eclipse Collections has several primitive data structures today, that work with all eight primitive types.

Primitive Collection high-level view

There is support for converting between Object and primitive collections in the API.

Similar to the implementation on the object collections, return types are co-variant and specialized by subtypes. I will demonstrate this by showing API examples for both mutable and immutable types.

Visualizing the primitive collection type hierarchy

PrimitiveIterable is a parent type for all primitive collection types
The Iterable type hierarchy for a single primitive type — int

Build-time Code Generation

If it seems an impossible task to replicate and maintain the source code for the diagrams above eight times, you would be correct. The trick is that we code generate the primitive collection types from template files. Eclipse Collections uses StringTemplate to generate most of the Java source for its primitive collections. This happens at build time. So you will not see the .java source files for the primitive collections in GitHub. You will however see the templates which have a .stg extension in the eclipse-collections-code-generator module.

StringTemplate template file: primitiveList.stg

Show me the code

I will give examples of all of the primitive data structures for one primitive type — int. There is a very rich primitive collection API, and I will demonstrate quite a few methods in the code. Be prepared to scroll, as there are a lot of examples.

Table of Contents

Click a link below to jump to a section. Click the TOC link at the bottom of each section to return here.

Primitive List

The primitive List implementations in Eclipse Collections are array based. There are both mutable and immutable primitive lists. The following code will demonstrate APIs available on mutable and immutable types. A pattern will emerge in the package hierarchy that will make it easier to discover different types in an IDE.

Note: IntInterval is an ImmutableIntList.

TOC

Primitive Set

The primitive Set implementations in Eclipse Collections are array based, and use open addressing. There are both mutable and immutable primitive sets. The following code will demonstrate APIs available on mutable and immutable sets.

TOC

Primitive Bag

The primitive Bag implementations in Eclipse Collections are backed by primitive Maps. For example, an IntBag will be backed by an IntIntMap. where the first int is the type of Bag, and the second int is the count of each value in the Bag. There are both mutable and immutable primitive bags. The following code will demonstrate APIs available on mutable and immutable bags.

TOC

Primitive Stack

The primitive Stack implementations in Eclipse Collections are backed by a primitive array. There are both mutable and immutable primitive stacks. The following code will demonstrate APIs available on mutable and immutable stacks.

Note: Stack is not a Collection, so Stack does not have extra mutating methods like add, addAll, remove, removeAll or retainAll. In Eclipse Collections, Stack does have a rich iterable set of methods (e.g. select, reject, collect, etc.).

TOC

Primitive Map

The primitive Map implementations in Eclipse Collections support all combinations of key types and value types. In the case where the key and value types are the same, both keys and values are stored in a single array. In the cases where the key and value types are different, they will be stored in separate arrays. There are both mutable and immutable primitive maps. The following code will demonstrate APIs available on mutable and immutable maps.

Note: Map types in Eclipse Collections are Iterable on value. So an IntIntMap is also an IntIterable and supports the full set of methods available on IntIterable.

TOC

Primitive Lazy Iterable

Every primitive collection can return a lazy view on itself by calling asLazy. The following code will show some of the lazy operations available on a LazyIntInterable.

TOC

Primitive Synchronized Collections

Every mutable primitive collection can return a synchronized view on itself by calling asSynchronized. The synchronized collections are not safe to use with iterator without taking an explicit lock. However, there are a large variety of atomic operations available on the type that take locks inside the methods.

TOC

Primitive Unmodifiable Collections

Every mutable primitive collection can return an unmodifiable view on itself by calling asUnmodifiable. The unmodifiable collections are not safe to use with mutating methods like add, remove, etc. as they will throw an UnsupportedOperationException. However, there are a large variety readonly methods available on the collections.

TOC

Primitive String

Eclipse Collections has three primitive String adapters — CharAdapter, CodePointAdapter, CodePointList. These types provide rich primitive protocols for the characters or code points in a String. CharAdapter and CodePointAdapter are pure views on a String. CodePointList actually caches the code points from the String in an ImmutableIntList.

TOC

Memory Costs

Boxing primitives into wrappers comes with a significant memory cost. The following table shows the comparative cost in memory for storing 1 million int values into both primitive and boxed data structures. I may take some time later to put this into a chart, but thought I would share these raw comparisons early, along with the code that I used to calculate them.

Table comparing memory costs of primitive and boxed collections

Note: The easiest garbage to collect is the garbage that was never generated.

Waiting for Valhalla

I am excited about the promise project Valhalla has for Java. You might be waiting for project Valhalla to improve the performance of the code you’re writing. I honestly wish I could have waited for Valhalla. It would have likely saved me and many others who work on Eclipse Collections and other primitive collections libraries a lot of work. Unfortunately, waiting for an advanced capability in a future version of a programming language is not an option for everyone.

We added primitive collections with rich APIs to Eclipse Collections almost nine years ago. The benefits we have seen have been outstanding. Recognition of these benefits have hopefully helped to maintain the continued focus and development of JEP 218: Generics over Primitive Types.

I’m glad we didn’t wait. Lessons we have learned and continue to learn in the primitive collection space can help inform the design and development of project Valhalla.

Thoughts on the missing data structures in Java

Even after we get to Valhalla in Java, there will be a lot of missing data structures in Java. I wrote about the need for a renaissance in the Java Collections space last year. The potential feature space of a comprehensive Java collections library is enormous. That much is evident based on the size of Eclipse Collections. Regardless, we will keep carrying on in Eclipse Collections and helping to evolve the Java collection space.

I hope you enjoyed my three part overview of some of the missing Java data structures no one ever told you about. Now you can say that someone told you about some of them.

Special thanks to Sirisha Pratha for proofreading the draft versions of this blog.

I am a Project Lead and Committer for the Eclipse Collections OSS project at the Eclipse Foundation. Eclipse Collections is open for contributions. If you like the library, you can let us know by starring it on GitHub.

--

--

Donald Raab
Javarevisited

Java Champion. Creator of the Eclipse Collections OSS Java library (https://github.com/eclipse/eclipse-collections). Inspired by Smalltalk. Opinions are my own.