The missing Java data structures no one ever told you about — Part 3
Learn about primitive collection support in Eclipse Collections.
The evils of boxing in Java
Java has eight primitive types. They are boolean
, byte
, char
, short
, int
, float
, long
and double
. Each primitive type has a corresponding Object wrapper. They are Boolean
, Byte
, Character
, Short
, Integer
, Float
, Long
and Double
. Boxing is the process of taking a primitive value and storing it in an instance of its corresponding wrapper type. Autoboxing was a feature added in Java 5. Autoboxing made it possible to automatically wrap a primitive value in an instance of its corresponding wrapper class without requiring developers to explicitly write any code to perform the wrapping.
For more information on autoboxing and unboxing, please refer to the Java tutorial section on Autoboxing and Unboxing.
“Measuring Object Sizes in the JVM” is a good article and introduction to using Java Object Layout which is an OpenJDK Code Tools project.
Why is boxing evil?
Boxing in Java is almost pure waste. The thing developers care about most often is the primitive value in the wrapper, not the wrapper itself. The wrapper exists primarily so developers can use the primitive value in collections. The convenience of autoboxing is evil because it enables the developer to silently generate wrapper garbage that can have adverse impacts on the performance of their applications.
Thank you, no hidden boxes, please!
Eclipse Collections helps prevent developers from creating hidden boxes in the Java heap, by giving them complete control via a full complement of primitive collections. Eclipse Collections has several primitive data structures today, that work with all eight primitive types.
There is support for converting between Object and primitive collections in the API.
Similar to the implementation on the object collections, return types are co-variant and specialized by subtypes. I will demonstrate this by showing API examples for both mutable and immutable types.
Visualizing the primitive collection type hierarchy
Build-time Code Generation
If it seems an impossible task to replicate and maintain the source code for the diagrams above eight times, you would be correct. The trick is that we code generate the primitive collection types from template files. Eclipse Collections uses StringTemplate to generate most of the Java source for its primitive collections. This happens at build time. So you will not see the .java source files for the primitive collections in GitHub. You will however see the templates which have a .stg extension in the eclipse-collections-code-generator
module.
Show me the code
I will give examples of all of the primitive data structures for one primitive type — int
. There is a very rich primitive collection API, and I will demonstrate quite a few methods in the code. Be prepared to scroll, as there are a lot of examples.
Table of Contents
Click a link below to jump to a section. Click the TOC link at the bottom of each section to return here.
Primitive List
The primitive List
implementations in Eclipse Collections are array based. There are both mutable and immutable primitive lists. The following code will demonstrate APIs available on mutable and immutable types. A pattern will emerge in the package hierarchy that will make it easier to discover different types in an IDE.
Note: IntInterval
is an ImmutableIntList
.
Primitive Set
The primitive Set
implementations in Eclipse Collections are array based, and use open addressing. There are both mutable and immutable primitive sets. The following code will demonstrate APIs available on mutable and immutable sets.
Primitive Bag
The primitive Bag
implementations in Eclipse Collections are backed by primitive Maps
. For example, an IntBag
will be backed by an IntIntMap
. where the first int
is the type of Bag
, and the second int
is the count of each value in the Bag
. There are both mutable and immutable primitive bags. The following code will demonstrate APIs available on mutable and immutable bags.
Primitive Stack
The primitive Stack
implementations in Eclipse Collections are backed by a primitive array. There are both mutable and immutable primitive stacks. The following code will demonstrate APIs available on mutable and immutable stacks.
Note: Stack
is not a Collection
, so Stack
does not have extra mutating methods like add
, addAll
, remove
, removeAll
or retainAll
. In Eclipse Collections, Stack
does have a rich iterable set of methods (e.g. select
, reject
, collect
, etc.).
Primitive Map
The primitive Map
implementations in Eclipse Collections support all combinations of key types and value types. In the case where the key and value types are the same, both keys and values are stored in a single array. In the cases where the key and value types are different, they will be stored in separate arrays. There are both mutable and immutable primitive maps. The following code will demonstrate APIs available on mutable and immutable maps.
Note: Map
types in Eclipse Collections are Iterable
on value. So an IntIntMap
is also an IntIterable
and supports the full set of methods available on IntIterable
.
Primitive Lazy Iterable
Every primitive collection can return a lazy view on itself by calling asLazy
. The following code will show some of the lazy operations available on a LazyIntInterable
.
Primitive Synchronized Collections
Every mutable primitive collection can return a synchronized view on itself by calling asSynchronized
. The synchronized collections are not safe to use with iterator
without taking an explicit lock. However, there are a large variety of atomic operations available on the type that take locks inside the methods.
Primitive Unmodifiable Collections
Every mutable primitive collection can return an unmodifiable view on itself by calling asUnmodifiable
. The unmodifiable collections are not safe to use with mutating methods like add
, remove
, etc. as they will throw an UnsupportedOperationException
. However, there are a large variety readonly methods available on the collections.
Primitive String
Eclipse Collections has three primitive String
adapters — CharAdapter
, CodePointAdapter
, CodePointList
. These types provide rich primitive protocols for the characters or code points in a String
. CharAdapter
and CodePointAdapter
are pure views on a String
. CodePointList
actually caches the code points from the String
in an ImmutableIntList
.
Memory Costs
Boxing primitives into wrappers comes with a significant memory cost. The following table shows the comparative cost in memory for storing 1 million int values into both primitive and boxed data structures. I may take some time later to put this into a chart, but thought I would share these raw comparisons early, along with the code that I used to calculate them.
Note: The easiest garbage to collect is the garbage that was never generated.
Waiting for Valhalla
I am excited about the promise project Valhalla has for Java. You might be waiting for project Valhalla to improve the performance of the code you’re writing. I honestly wish I could have waited for Valhalla. It would have likely saved me and many others who work on Eclipse Collections and other primitive collections libraries a lot of work. Unfortunately, waiting for an advanced capability in a future version of a programming language is not an option for everyone.
We added primitive collections with rich APIs to Eclipse Collections almost nine years ago. The benefits we have seen have been outstanding. Recognition of these benefits have hopefully helped to maintain the continued focus and development of JEP 218: Generics over Primitive Types.
I’m glad we didn’t wait. Lessons we have learned and continue to learn in the primitive collection space can help inform the design and development of project Valhalla.
Thoughts on the missing data structures in Java
Even after we get to Valhalla in Java, there will be a lot of missing data structures in Java. I wrote about the need for a renaissance in the Java Collections space last year. The potential feature space of a comprehensive Java collections library is enormous. That much is evident based on the size of Eclipse Collections. Regardless, we will keep carrying on in Eclipse Collections and helping to evolve the Java collection space.
I hope you enjoyed my three part overview of some of the missing Java data structures no one ever told you about. Now you can say that someone told you about some of them.
Special thanks to Sirisha Pratha for proofreading the draft versions of this blog.
I am a Project Lead and Committer for the Eclipse Collections OSS project at the Eclipse Foundation. Eclipse Collections is open for contributions. If you like the library, you can let us know by starring it on GitHub.
Other Java Programming articles you may like:
The Complete Java Developer roadmap
10 Things Java Programmer Should Learn
10 Programming languages You can Learn
10 Tools Every Java Developer Should Know
10 Reasons to Learn Java Programming languages in 2021
My Favorite Free Programming Courses for Beginners
10 Free Data Structure and Algorithms Courses
7 Best Data Structure And Algorithms Courses for Beginners
50+ Data Structure and Algorithms Interview questions