Java Typesafe enum history

alex_ber
Geek Culture
Published in
33 min readJul 31, 2021

In this story I will give overview of the evolution of the enum in Java.

You can skip this paragraph. Here I will give you some theoretical definition what enum is. Object oriented view: enum is generalization of Singelton design pattern. Singelton means, that we can have up to 1 instance, and enum means that we have exactly n instances, when n is compile-time constant. This view was dominated in Java until JDK 14/15. Functional view: enum is (poor) implemenation of disjoint union. The “real” Java implementation will be sealed classes. Sealed types are about “a finite number of possible types” whereas enums are about a “finite number of possible instances”. More on this will be in different story, now let’s have some historic perspective.

An enumerated type specifies a set of related constants as its values. Examples include a week of days, the standard north/south/east/west compass directions, a currency’s coin denominations, and a lexical analyzer’s token types.

Enumerated types have traditionally been implemented as sequences of integer constants, which is demonstrated by the following set of direction constants:

https://www.infoworld.com/article/3543350/how-to-use-typesafe-enums-in-java.html

Originally in JDK 1.0 enum construct wasn’t taken.

The way described above was the way that Java language designers originally advised Java programmers to handle Java’s lack of an enumeration feature.

There are several problems with this approach:

* Lack of type safety: Because an enumerated type constant is just an integer, any integer can be specified where the constant is required. Furthermore, addition, subtraction, and other math operations can be performed on these constants; for example, (DIR_NORTH + DIR_EAST) / DIR_SOUTH), which is meaningless.

* Namespace not present: An enumerated type’s constants must be prefixed with some kind of (hopefully) unique identifier (e.g., DIR_) to prevent collisions with another enumerated type’s constants.

* Brittleness: Because enumerated type constants are compiled into class files where their literal values are stored (in constant pools) [see also my side note below], changing a constant’s value requires that these class files and those application class files that depend on them be rebuilt. Otherwise, undefined behavior will occur at runtime.

* Lack of information: When a constant is printed, its integer value outputs. This output tells you nothing about what the integer value represents. It doesn’t even identify the enumerated type to which the constant belongs.

You could avoid the “lack of type safety” and “lack of information” problems by using java.lang.String constants. For example, you might specify static final String DIR_NORTH = "NORTH";. ..Unlike integer comparisons, you cannot compare string values with the == and != operators (which only compare references).

https://www.infoworld.com/article/3543350/how-to-use-typesafe-enums-in-java.html

See also https://docs.oracle.com/javase/8/docs/technotes/guides/language/enums.html

Side note: “changing a constant’s value requires that these class files and those application class files that depend on them be rebuilt”. It actually happens to me couple of times. I’m changing constant’s value in the source code, rely on build automatically feature of IDE, but the code don’t get what is supposed to do. After couple of hours of debugging I had figured out that the constant’s value didn’t changed at runtime. I concluded that there is some weird linkage error between classes (it happens couple of years early with my C++ code). so I just make clean and rebuilt. After couple of hours it happens again. And again. After some research I had figure out that this is because constant pool. I decided to disabled built automatically feature. Since than I never re-enable it. With built automatically feature disabled I am aware when I’m changing some constant, so I should rebuilt all my code, including my client code. When I don’t change any constant (most of the time), I’m just using some additional shortcut to save and rebuild. It is very rare that I need to clean and rebuild because of some weird behavior.

Quote:

These problems caused developers to invent a class-based alternative known as Typesafe Enum.

https://www.infoworld.com/article/3543350/how-to-use-typesafe-enums-in-java.html

Quote from another source:

In his excellent book Effective Java Programming Language Guide, Joshua Bloch shows how a type safe enumeration can be used in Java to define a set of named values (see Item 21: Replace enum constructs with classes). For example:

Now if a currentState object of type MachineStates is created, it can only take on the values {WAIT, NICKLE, DIME, QUARTER}. An out of range value will be caught at compile time.

… It would be nice to have a base class that packages the functionality of the type safe enumeration. Then the base class can be used to define a variety of type safe enumeration sub-classes:

And

The TypeSafeEnum base class can be found on this web page or downloaded here.

http://bearcave.com/software/java/misl/enum/type_safe_enum.html

For convenience I’m providing source-code of TypeSafeEnum:

Note:

  • First addition of the book was in 2001, so I suspect it was written on JDK 1.3. It still looks strange, because Collection API is available from JDK 1.2, but java.util.Vector (and not some implementation of java.util.List, for example ArrayList) is used. Note, that java.util.Enumeration is “old-style” java.util.Iterator. The required fix is simple.
  • Because, this is pre Java 5.0 code, it is ok, that it use raw class (java.lang.Class was generified in JDK 5.0). The required fix is trivial.
  • Naming convention: why getName() and getValue()? It sort of funny, that enumInfo doesn't follow JavaBean convention, but TypeSafeEnum API expose itself as JavaBean. In Java 5.0 enum corresponding methods (intentionally) doesn’t follows JavaBean convention. In either way, the required fix is trivial.
  • getName() and getValue() should be arguable made final. This is done in Java 5.0 enum. Personally I don’t think this is a big problem here. If somebody wants to shoot into his legs, nobody should stop him. TypeSafeEnum itself doesn’t rely on getName() and getValue() method. In either way, the required fix is trivial.
  • In Java 5.0 enum hashCode() was made final and equals() make comparison by ==:

This ensure that java.lang.Object's hashCode() and equals() are used for Enums comparisons. It is very efficient one, it also works with multiple class loaders (more on this below). It also right things to do from the correctness point of view. If we have 2 enum constant with the same data-members, we want to treat them as different instances. In java.lang.Enum "value" (it is called ordinal their) is supplied by the client code, so we can’t rely on it. In this implementation“value” is calculated internally and the client code code can’t trick it out, so we can we can use for comparison. So, it can be ok, not provide any implementation of hashCode() and equals(), implementation described above can be provided or one that based on “value” comparison. After design decision is made, such change should be simple to implement.

  • In Java 5.0 enum finalize() method was made empty and final. Again, if somebody wants to shoot into his legs, it may do so. Personally, I don’t think this is required. In either way, the required fix is trivial.
  • In Java 5.0 enum is implemented java.lang.Comparable. Well, from design point of view one can argue that in every enum we can define total order based on “value” (ordinal in Java 5.0 enum’s terms). Another person may say, that such order not always make sense. In either way, the required fix is simple.
  • At line 25 infoVec should be declared final. In Java 8+ this variable will be effectively final, so it kind of ok (I will still argue to change it to final just for readability — to state intent clear), but back them it can potentially prevents bugs. The required fix is trivial.
  • Interesting enough, enumInfo is created not as JavaBean. First of all it’s name is lower-case, it’s better to be uppercases. In this particular case, I’m actually ok with not defining this class as JavaBean, I wouldn’t insist on the fix. In either way, the required fix is trivial.
  • It is not obvious, but actually this class is (almost)Thread-safe. After code execution leaves constructor the class is immutable. The only required fix is in line 45 when we’re getting out the Enum constant to the client. The easiest fix will be make defensive copy of returned Enumeation. The required fix is simple.
  • Note, that despite the fact that inside constructor we’re changing the static infoVecvariable, such change is actually Thread-safe. infoVec is only read outside of constructor. Inside constructor there is implicit lock on TypeSafeEnum.class imposed in class-loader of the class. Such lock is required by JLS. Quote :

For each class or interface C, there is a unique initialization lock LC.

….

For each class or interface C, there is a unique initialization lock LC. The mapping from C to LC is left to the discretion of the Java Virtual Machine implementation. The procedure for initializing C is then as follows:

Synchronize on the initialization lock, LC, for C. This involves waiting until the current thread can acquire LC.

https://docs.oracle.com/javase/specs/jls/se8/html/jls-12.html#jls-12.4.2

  • There is very subtle issue around enumInfo.hashCode.
    This field is populated with hashCode of enum’s class. It can be thought asgetClass().hashCode(). As infoVecvariable is static, so our infoVec will held all type-safe enums of the application in one big Vector (list), where enum’s class’s hashCode is used as discriminated field (see above).
    You should see here code smell.
    There are some performance issues here, that for example, can lead to OutOfMemoryError.
    There is also some memory leak related issue in mutli class loader environment.
    The required fix is not hard, but it is hard to get into account all issues. I will get back to this point below (see below about cache).
  • There are 2 method that are missing. findEnumByValue() and findEnumByName(). The client code can get Enumeration (that will take time proportional to number of type-safe enum’s in the application, that can be actually big, especially if we have multiple class loader in use, see below) and than it should find correct one using comparison (and relying on getName() and getValue()!). This is proportional to number of constants in Enum. This task is hard, but doable (see below about cache). See https://bugs.java.com/bugdatabase/view_bug.do?bug_id=5058132
  • In Java 5.0 enum overrides java.lang.Object’s clone() method to throw new CloneNotSupportedException(). The implementation above fails to do so. This is actually mandatory fix, otherwise we’re providing loophole for creating new constant. The required fix is simple.
  • In Java 5.0 enum provides readObject() and readObjectNoData() methods method that throw new InvalidObjectException(). This is done in order to block the ability to create new instance of enum using Serialization mechanism. It can be implemented also another way to replace serialized enum with existing one (based on it’s getName() and getValue()). The point is to prevent ability to create new instances. The required fix is simple.
  • There is also subtle issue whether this code is properly synchronized. I think it does, but the reason is not obvious. First of all, by Java Language Specification, class initialization happens top-to-bottom and static initialization should be done before non-static.

Note: This part is complicated, you can skip it and continue reading from “Take a look on the list above”.

Still, the code is very brittle, it can become incorrect if you slightly change the order of variable declaration (side note: this also actually happens to me and took a lot of time to understand why my code suddenly stopped to work correctly), and it is not easy to reason about. Let’s consider some simpler example based on https://stackoverflow.com/questions/2547713/why-static-fields-are-not-initialized-in-time

If we will run this code, what should be printed on screen? If you have guessed that the output was:

nullnullMyClass@15db9742null

please, take my congratulations.

The simplified explanation is the following: in order to run the main, MyClass should be initialized first. The initialization is going top-to-bottom and static initialization should be done before non-static, so first initialization in line 3 will occur. First part of it, will be call to constructor, so we’re jumping into line 7. We still in static initialization block, but now we‘re running in non-static context (side note: this is the reason why having the call to the method from constructor is not recommended — they can obverse uninitialized non-static fields). Because they haven’t yet been initialized they have null values (it can contain garbage, like in C; this is is actually guaranteed by JVM). So, we will see 2 null printed on the console. Than we will exit the constructor and goes back to line 3 and initialize myClass field with object (myclass2 is still null, though). Than we’re moving to line 4. Again, constructor will be called. Inside the constructor we’re referencing to static fields — myClass was initialized, so we see none-null value for it, but myclass2is still null, so we see null printed out. Than we will exit the constructor, and there is nothing (interesting) left to do.

The detail explanation is the following:

This is the sequence the JVM goes through when you first reference the class MyClass.

* Load the byte-code into memory.

* Memory for the static storage is cleared (binary zero).

* Initialize the class:

1. Execute each static initializer in the order that it appears, this includes static variables and static { ... } blocks.

2. JVM then initializes [JVM holds initialization lock LC, see above] your myClass static variable to a new instance of MyClass.

3. When this happens, the JVM notices that MyClass is already loaded (byte-code) and in the process of being initialized, so it skips [static] initialization.

4. Allocate memory on heap for object.

5. Execute constructor.

6. Print out value of obj which is still null (since it is not part of the heap and constructor initialized variables).

6. When constructor finishes, execute next static initializer which sets obj to a new instance of Object.

* Class initialization done. From this point, all constructor calls will behave as you presume/expect — that is obj would not be null but a reference to an Object instance.

Note as well, this all happens on the same thread that first references the class. Second, the JVM guarantees that initialization will complete before any other thread is allowed to use this class [JVM holds initialization lock LC, see above].

https://stackoverflow.com/a/2557613/1137529

I will re-iterate: JVM guarantees that initialization will complete before any other thread is allowed to use this class. Actually, JVM holds initialization lock LC during initialization process. Because of this this lock, the code of TypeSafeEnum is thread-safe; infoVec is initialized before any of TypeSafeEnum instances; infoVec changes only inside constructor of TypeSafeEnum (that is under initialization lock LC; findInfo() is also executing under initialization lock LC). This is statement is not trivial, but true.

END OF COMPICATED PART

Take a look on the list above. While most of the point above are simple (I will get back to the hard one in a while), there many of them that are not obvious. The fact is that there are many proposed “standard” implementation of the TypeSafeEnum, including one from Joshua Bloch, that misses some of them.

Here is the place to tell, that somewhere in 2006 on JDK 1.4 I did my own implementation. When JDK 5 came out, one of the first things that I did is to look on Sun’s (Oracle bought Sun later) implementation of enum.

The implementation was not identical, but all issues was covered. My implementation, for example, take into account Serialization hole, while not by preventing it totally, but by instance substitution. I didn’t implement java.lang.Comparable interface also, for example.

There are 4 issues that are actually difficult to address: cache, GC, class loader, extensibility.

My implementation of cache was totally different of Sun’s. I can’t augment java.lang.Class to store enum constants inside it (this way some Garbage Collection (GC) related issue was resolved, more on this below) and also changing javac — java compiler was not one of the options that I have considered. :-) )

My implementation of the cache was as following: TypeSafeEnum has private static HashMap with key as String (it is YourEnum.class.getName()) and the value is HashMap with key String (it is getName()) and the value is enum constant. In TypeSafeEnum there is public static TypeSafeEnum valueOf(Class enumClass, String name) function that has obvious implementation. There is also protected addEnum()method that is meant to be called in constructor in order to put enum constant to the cache.

The basic idea is that TypeSafeEnum contain cache of all enums that inherit from him — each enum’s constant are group by theirs className (as String). After enum is constructed you should access this cache only through valueOf() function. Potentially you can access some other’s enum’s, but you should provide the Class object of that access. If you can access the Class object, it is ok to provide you access to the instances of this Class.

Now, you may have the following questions:

  1. Why the cache is static?
  2. Why the outer key is String and not Class?
  3. Why the inner cache is HashMap and not ArrayList?

Let’s start to answer these question one by one.

  1. You may think that the cache is static in order to be able to be accessed by public static TypeSafeEnum valueOf(Class enumClass, String name) function. This is only one of the reasons. There are more subtle reason in initialization order of the enum. In order to demonstrate what I’m talking about I will provide some examples from Java Language Specification.

Example 8.9.2–2. Restriction On Enum Constant Self-Reference

Without the rule on static field access, apparently reasonable code would fail at run time… Here is an example of the sort of code that would fail:

Static initialization of this enum would throw a NullPointerException because the static variable colorMap is uninitialized when the constructors for the enum constants run…. The code can easily be refactored to work properly:

The refactored version is clearly correct, as static initialization occurs top to bottom.

https://docs.oracle.com/javase/specs/jls/se8/html/jls-8.html#jls-8.9

Yes, I have encountered in such situation in practice. It is so bad, that Sun’s added specific rule to reject such code above in compilation time.

With static field it is sufficient to do just simple code reordering and refactoring as described. With non-static field it much harder to have working code no matter how TypeSafeEnum is extended (because static field are initialized first top-to-buttom, than non-static field are initialized, but in constructor of the enum class we’re calling to addEnum class that can cause unintentional initialization of not-yet-initialzied fields (for more details see MyClass complicated example above) … It is really a mess, in order to avoid this, it is better to declare cache as static, in such a way it will be initialized before constructor of the enum constant is called).

2. Why the outer key is String and not Class?

There are 2 reasons: class loading-related and GC-related.

Let’s start from class loading problem.

This section is complicated, you can safely skip it and continue to read from GC-related issue.

In my setup TypeSafeEnum belongs to some infra project, so it was packaged in some jar. Tomcat web server host multiple application that was packaged in separate war files (you should recall, it is 2006, back them it was normal practice to host multiple application on the same web server; not to mention, it was normal practice to have Tomcat manually installed and configured and application are deployed to him). So, I have multiple war files that has copy of the jar that has TypeSafeEnum in it.

So, inside JVM we have class loader per war file and we have separate class loader for jar file. So, each war class loader has it’s own jar class loader. You can read here http://tomcat.apache.org/tomcat-6.0-doc/class-loader-howto.html and for more details.

Note: While it is possible to put jar to the shared lib and using this way we can have shared jar class loader that will lead to only 1TypeSafeEnum in the memory, such attempt actually is actually highly non-trivial. You can read here https://stackoverflow.com/questions/267953/does-tomcat-load-the-same-library-file-into-memory-twice-if-they-are-in-two-web Just one quote from this link:

Placing libraries in commons directory can be dangerous, and must be used only if you can control which webapps are deployed, and what are the version of libraries used for each webapp…

[So, you can’t have different version of such shared jars. But there is more].

In such scenario you will have 1 TypeSafeEnum that have 1 cache that holds enum that was loaded from different war class loaders. This can lead to the problem of application redeploying. If I want to reload some application A.war what Tomcat does behind the scene it throws application class loader that loads all classes of A.war. This should lead to GC to collect all such classes, so they can be reloaded by newly created dedicated war class loader. But now, I have cache that sits in shared jar class loader that indirectly holds strong references to the loaded classes, so reload will fail. Possible solution for such situation would be to use WeakHashMap.

Note: This should be done carefully: we don’t want that enum constant to disappear in the middle of the application’s run. This actually did happen to me, so I’ve switch back from originally use WeakHashMap to HashMap.

I will get back to this point below.

We ends up of multiple copies of TypeSafeEnum sitting in the memory. But they are exists in totally independent class loader hierarchy (user defined classes that were loaded from application A.war class loader can’t access user defined classes that were loaded from application B.war class loader). So, while we do have some waste both on disk — multiple copies of the same jar and in memory — we actually have multiple instances of TypeSafeEnum.class leaving in memory, but they can’t interact with each other because of class loader isolation provided by Tomcat. So, it doesn’t raze any practical issues and in fact, we can use Class as a key in the map. There is however, still GC-related issue, that I’m going to discuss below.

Side note: On my next job we have moved our application from Tomcat to JBoss. Initially JBoss was misconfigured (it was done to solve classpath-hell related issue; JBoss use logger and we use the same jar for logging, but with different versioned, we want to use our jar for logging inside war and JBoss’ logger for JBoss — such issue was eventually lead to Java Platform Module System at JDK 9). The observed behavior was that we’re getting ClassClassException on the Logger class on first user request. Application starts normally and initialized with logger, but then on first user request we have weirdClassClassException saying that org.apache.log4j.Logger can't be cast to org.apache.log4j.Logger. When this exception was shown to me, I have immediately remind the scenario described above, that actually I can have the same class loaded with 2 different class loader. It helps us to trace down issue, that initialization code was run with different class loader, not the war class loader, but it’s parent. Again, it happens, because misconfiguration of JBoss, and it was fixed. We have solved different version of loggerin JBoss in the way that doesn’t change class loader usage.

GC-related problem for using String and not Class as outer key in the cache

If you’re using HashMap for the cache and use Class as key, you may prevent for the enum to be Garbage collected.

Quote:

…care should be taken to ensure that value objects do not strongly refer to their own keys, either directly or indirectly, since that will prevent the keys from being discarded.

https://docs.oracle.com/javase/8/docs/api/java/util/WeakHashMap.html

So, using String and not Class prevents the possibility that GC will not able to discard the enum because we’re holding it’s Class as key.

For example, if our enum was loading using war class loader and we want to redeploy our application, Tomcat will through away current war class loader and GC should discard all loaded classes. If TypeSafeEnum was loaded using some different non-discardable class loader this will prevent for such enum to be discarded. So, using String and Class avoids in advance such pitfalls.

Note: There is extreme case that was recognized by me and was ignored by design. We can have enum say com.company.entities.MyEnum that is defined inside jar-file and redefined in the war-file with the very same name com.company.entities.MyEnum. It is done in order to “override” some class definition. Under the regular class loading mechanism this works fine, there will be only 1 java.lang.Class that represent com.company.entities.MyEnum. But I can write code, that will also loads com.company.entities.MyEnum from the jar file (by explicitly specifying jar's class loader; and I can get jar's class loader reference pretty easily). In this extreme case I will have 2 distrinct java.lang.Class's that both represent com.company.entities.MyEnum, but if I’m using String and not Class I can’t store them both. I’m just ignoring this issue, I don’t think this is something that will occur in practice.

It is instructive to look on JDK 5.0 implementation of java.lang.Enum's cache. Such cache is not saved inside java.lang.Enum object, but it is saved on enum's java.lang.Class object.

It will be wrong to store cache as HashMap inside java.lang.Enum.java.lang.Enum is loaded by system class loader and until JDK 9 it was not even theoretically possible to discard classes loaded by it (modularization of the JDK did provide such theoretical possibility, see Java Platform Module System). So, if some application A.war has defined some enum and it’s instance is store inside java.lang.Enum object that was loaded by system class loader I will always have strong reference to such enum and it can’t be discarded.

So, Sun has chosen the following way: instead of storing the cache of all enum’s inside java.lang.Enum object they will store them inside … java.lang.Class. Sun’s modifies actually java.lang.Class. java.lang.Enum.valueOf(Class<T> enumType, String name) actually retrieve the cache from Class<T> enumType. In order to this to work JDK 5' enums was made practically non-extensible — you can define enum’s contant only in the class that directly inherits from java.lang.Enum. I will get back to this point below.

When cache is stored in java.lang.Class that represents enum and not in one place inside java.lang.Enum, it actually make GC easy. Because the enum instance and it’s java.lang.Class are guaranteed to be loaded by the same class loader, we’re actually storing the enum’s cache inside the object that was loaded by the same class loader. So, if GC want to discard the enum object, existence of the cache cache inside it’s java.lang.Class will not prevent garbage collection.

3. Why the inner cache is HashMap and not ArrayList?

This is interesting question. The main reason that I chose HashMap and not ArrayList was performance consideration. I thought that from theoretical point of view, it will be much quicker to query the HashMap and not to iterate over ArrayList.

When later I did some measurements I’ve figure out that until some threshold it is actually take less time to iterate over entire ArrayList/array than to use HashMap. Interestingly enough java.util.EnumSet uses threshold of 64 to determine whether enough has “regular” size or it has “big” size so alternative implementation will be used.

java.lang.Enum.valueOf(Class<T> enumType, String name) uses HashMap.

Interesting enough java.lang.Class has 3 different flavor of the cache:

  • package-private enumConstantDirectory() that returns HashMap. Specifically intended to be used inside java.lang.Enum.valueOf(Class<T> enumType, String name). It is initialized on first usage HashMap (Note: it is also volatile (and transient — this is less interesting)).
  • package-private getEnumConstantsShared() that return array. This method use some clever trick to call compile-generated values() method that returns array of enum’s constants as is. It is widely used inside JDK. For example, inside java.util.EnumMap and java.util.EnumSet. (It uses some helper class that allows class from another package have access to package-private method). It is initialized on first usage HashMap (Note: it is also volatile (and transient — this is less interesting)).
  • public getEnumConstants() method that on every call clones getEnumConstantsShared() array.

So, we can see 3 different usage pattern. Inside java.lang.Enum.valueOf(Class<T> enumType, String name) initialized on first usage HashMap. For public usage it returned defensively copied array, so if somebody outside of JDK want to implemented cache the intended usage is array and not HashMap (but again, JDK itself prefer HashMap). For JDK internal usage other than java.lang.Enum.valueOf(Class<T> enumType, String name) JDK prefer to use array for cache. It bypasses making defensive copy (my guess: for performance reason), but nevertheless it use array representation.

So, there is some ambiguaty here. I think it really depends on how you’re going to use the cache. If you need to find only correct enum’s constant (by name, for example), so using HashMap may make sense (again, it really depends on number of enum’s constant you have, if you, let say, 3 constants, so using array will be faster). But if you need to iterate over all enum’s constants, as in the case of EnumSet, for example, so it definitely better to use array as a cache. You will want to bypass defensive copy. One way to do it, is to make one “expensive” call to public getEnumConstants() method of the java.lang.Class and store the result of the call in some data field. Of course, you should think about proper synchronization (at least, you should define your data field as violatile).

Extensibility

This is actually hard one. I will describe the issue from practical point of view first.

I have main table in some SQL-based DB. It has some attribute, say colour that holds id and points (by foreign key) to Color table (I will refer to it as lookup table) that has at least to attributes id and name. So, in my table I have attribute that have colour, for example, value 1, and in Color lookup table there is row with id=1, name=white.

Now, I want to represent the main table in the Java code. I can do it using some ORM Framework or to rely on Spring Repository or to implement my own DAL, it doesn’t matter for the purpose of this story. What options do I have to model colour attribute?

It can be integer, java.lang.String or Enum. Now, you can go to the beginning of the article to convince youself that normally you would like to have Enum.

Side note: On the project that I’m currently working I actually go with string choice. First of all, my current project is on Python and not on Java. Second, I’m using Postgress. One the reasons to choose Postgress was it’s support of Enum in the DB level. In my case I don’t have lookup Colour table at all. Instead I have maintable_colour enum type. My main table has attribute colour of type maintable_colour. When I’m making insert/update or select of the colour attribute I see string and only string (occasionally, I need to make some explicit casting, for example, if I want to have type of array of enum). Internally, however, it each enum constant is represented by 4 bytes (as int in Java), it as if Postgress uses the value from TypeSafeEnum or ordinal from java.lang.Enum in addition from some translation logic from int to string that is Postgress-specific, but conceptually the same as in TypeSafeEnum or java.lang.Enum.

The reason that I’m using String: at the backend I don’t have any manulatition with enums. Essentially I’m getting the data (parsed as java.lang.String) and transferring the enum constrant as is to the DB. It goes unchanged to the library I’m using to interact with DB (it actually does understand Enum in DB level and it does some manipulation like adding explicit casting on it; in most cases these manipulations are fully transparent for the code). So, from the backend perspective it receives string and send string to DB. Everything is readable and debagable. There is no extra lookup call or join between main and lookup table. At the DB storage level, this string is converted to 4 bytes, so it is efficient.

Of course if string that is going to be stored in DB doesn’t represent some enum constant validation exception will be thrown to maintain data consistancy.

For more information, see What would you use ENUM for in SQL?https://dba.stackexchange.com/questions/231795/what-would-you-use-enum-for-in-sql and Enumerated Types https://www.postgresql.org/docs/13/datatype-enum.html

So, if you want to model colour attribute of main table in Java it is natural to choose java.lang.Enum (provided that your DAL supports enums, this is no longer issue for a long time). The standard way to do it is create enum in the compile time and duplicate the content of the lookup Colour table.

If the data is static, this is good enough solution. But what if the content of the table dynamic or is unknown on the time of the code writing?

What do I mean by dynamic content? If we go back to our example, if the colours that our system supports may change over time. In such case it is better to model it with different table.

What do I mean by the content is unknown? Well, I mean, that we have some generic code that can be used with different specific tables. It is not any table, it should be table that satisfies some specific constraints. The colours itself can have different set of supported values, but it can be different from instance to instance. In such a case for each specific instance we have static content, but the (generic) code that works with the specific instance (specific main+lookup tables) doesn’t know the content.

Scenario that is described above is not theoretical, I actually evented my own enum implementation because of it.

So, in this scenario I can’t “just” create enum in the compile time. What I do want to do is to have at the code level some type-safe type that will be converted to int in DB level.

It is interesting to look on java.lang.Enum, does it support such use-case?

Quote:

An enum declaration is implicitly final unless it contains at least one enum constant that has a class body…

The direct superclass of an enum type E is Enum<E>

An enum type has no instances other than those defined by its enum constants...

https://docs.oracle.com/javase/specs/jls/se8/html/jls-8.html#jls-8.9

As you can see, java.lang.Enum is almost final. You can declare YourEnum that will extends java.lang.Enum. The enum constant can have class body, but it is explicitly prohibited to add new enum constants besides those one that are defined in compile-time in YourEnum.

What is enum constant that has a class body, anyway? Let’s see on code example:

https://docs.oracle.com/javase/8/docs/technotes/guides/language/enums.html

{ double eval(double x, double y) { return x + y; } },

is example of enum constant, namely PLUS, that have class body (see above).

Conceptually, it is equivalent to the following code:

https://docs.oracle.com/javase/8/docs/technotes/guides/language/enums.html

Note:

  • You can think as for every constant we can override some method. Another way to think about it, that this “extended enum” has some one method with internal dispatch mechanism as described above. If it resembles sealed class usage (disjoint union), this is correct, but we will not go in this direction right now.
  • Quote from the link above:

This works fine, but it will not compile without the throw statement, which is not terribly pretty. Worse, you must remember to add a new case to the switch statement each time you add a new constant to Operation. If you forget, the eval method with fail, executing the aforementioned throw statement.

See https://alex-ber.medium.com/java-exception-hierarchy-f6aef08ab9b about why AssertionError is thrown.

  • It is also interesting that when Enum was added at JDK 5.0 special construction enum constant that has a class body was added in order to avoid using “switch on enum”. If you look on this closely, this example of Pattern matching. It is funny enough, that JDK 14 introduces a limited form of pattern matching, so, essentially now it is admitted that using switch is preferred way, and whole enum constant that has a class body was mistake.
  • I want to re-iterate again, when JDK 5.0 was released using enum constant that has a class body was preffered our switching over enum (this is form of pattern matching). From JDK 14 Java is moving to support Sum and product types (those are concepts from functional programming, I will write separate big story about them). Here I want to mention only, that this enum constant that has a class body was abundon. Again, it show you that using inheritance mechanism to implement disjouint union is a wrong way to go.

Enum constant that has a class body

Also I want to mention that enum constant that has a class body is some generalization of the the concept that enum can have data and behavior on it.

For example consider the planets of the solar system. Each planet knows its mass and radius, and can calculate its surface gravity and the weight of an object on the planet. Here is how it looks:

https://docs.oracle.com/javase/8/docs/technotes/guides/language/enums.html

The enum type Planet contains a constructor, and each enum constant is declared with parameters to be passed to the constructor when it is created.

Strategy and Visitor pattern

You can safely skip this.

What if in order to calculate surfaceGravity() or surfaceWeight() you don’t have one universal formula, you need to do some different calucalations depending on what type of Planent do you use. What if, you has BlackHole as Planet, so for surface gravity of Black Hole for the Schwarzschild solution with (C=299_792_458 metres per second — the speed of light in vacuum) mass M is C*C*C*C / 4*G*mass. How you can override the surfaceGravity() method for each enum constant?

Well, this is example of enum constant, namely that have class body, see example of enum Operation above. In this case the definition will look like:

where EARTH_BLACK_HOLE is hypothetical celestial body the Black Hole with the mass of Earth.

What if we don’t know the exact behaviour of the provided enum method (like surfaceGravity()) at the time of writting enum? What if we want to decouple our enum from the operation we’re providing? In the example above if enum represents nodes of operation (plus, minus, etc) we may want to provide the evaluation logic in different class.

So, we want to leave our enum as is and we want extract the behaviour to different class. This is know as Strategy design pattern.

Code below is based on https://www.baeldung.com/a-guide-to-java-enums

Let’s suppose we have some Pizza class, that looks something like this:

Now we can have separate PizzaDeliveryStrategy:

The client code can looks something like:

The essential past is DeliveryStrategy. Essentially, it is FunctionIntefrace — it has only one non-abstract method (that defines behaviour in OO terms). We have some “concept” that “knows” how to deliver things (in out example, it is Pizza, it is actually parametrized by Pizza) and we have 2 implementations thats implments this “concept”, theire type is some extension of enum PizzaDeliveryStrategy, first implementation is store in PizzaDeliveryStrategy.NORMAL and PizzaDeliveryStrategy.EXPRESS. I want to re-iterate, we have enum constant PizzaDeliveryStrategy.NORMAL that has a class body and we have enum constant PizzaDeliveryStrategy.EXPRESS that has a class body, both classes actually implements DeliveryStrategy<Pizza>.

Now, if you want also being able to deliver Coca-Cola, you should create

Now, if you really have multiple products to deliver, you will notice that a lot of code is actually duplicated. You may also notice that you have actually 2 different, but quite closed methods deliver(deliverable) and deliver().

You may use Visitor design pattern to solve this. You may think as you want to implement double-dispatched method

public <T extends Deliverable> deliver(T, DeliveryStrategy<T>)

when DeliveryStrategy implementaions are enum constant that has a class body.

Side note: Actually, in one of my projects I’ve been prototyped such usage of Visitor design pattern with enums, but it was too complicated, so it was first reduced to Strategy and when new “Pizza” was added it was removed totally from the prototype, I just have some switch on enum as private method that is called whenever is needed. I feel bad about it (hey, I have broke the encapsulation), but I’ve said to myself, I’ve really tried to do it right way, but it was overcomplicated. Sealed class (in JDK 15) was designed preciesly to feel this gap. I will write separate article on this.

My type-safe extensible enum implementation

Let’s go back to my own implementation of enum. I’m remind you that I did on JDK 1.4 that doesn’t support enum, so I have public static finalvariables instead enum constant. My implementation was also type-safe. It was pretty close to JDK 1.5 enum, with 2 significant differences. I didn’t alter compiler or java.lang.Class, so the “enum cache” was saved inside my base enum class, I have public static finalvariables instead enum constant and my enum was extensible.

As we see we can’t use standard java.lang.Enum to have at the code level some type-safe type that will be converted to int in DB level. While it can be extended (in very limited way) you can’t add enum constant dynamicaly at runtime. It is interesting to point out that in JDK 15 “sealed classes” was added alongside existing enum, because it was too limited. The road not take was actually to convert enum to “sealed classes”. When enum was added in JDK 1.5 nobody have views enums as “sum type” (in category theory sense), “sealed classes” are Java limited version of such “sum type”. As I’ve said above, I will write separate story about this.

So, how I’ve managed to have at the code level some type-safe type that will be converted to int in DB level?

I want to be a bit more precise. In my use case I have some enum constant that should be common to every usage. Every (code) user can add some new enum constants as he wish.

So, I have some BaseEnum that was my variant of TypeSafeEnum. It has some compile-time enum constants. It also has protected addEnum() method.

Each (code) user defines it’s own enum that extends BaseEnum. In my case there was 1 user’s enum per war classloader — that is I have different war application each of each has it’s own version of enum, so it gives good isolation between user’s enums. Depending on the configuration of your Web Server, the isolation can be different.

Actually, my original architecture was multiple wars, each has there own “enums” (classes that extends BaseEnum). BaseEnum itself was in jar file that has been duplicated accross each war file. So, I have multiple BaseEnum class loaded by different (war) class loaders and. So, each actual enum was stored in different BaseEnum copy. The isolation was perfect, but we have bad memory utilization.

So, we’ve change the architecture in production to have 1 war file that will conisists with different application as jar, also BaseEnum was packaged in different jar file. Now, we have only 1 copy of BaseEnum, so memory utilization is much better. It is much harder to make things works, because, now isolation is pretty bad, one “application” see code of another “application”. Care should be taken for configuration file, such as log4j.xml, etc. What is relevant for us, that now all enums from different application are stored in the very same class BaseEnum that is visible for every “appllication” (jar classloader). Recall, that enum cache is orginized in such a way that enum constants are grouped together by their defining class. It is however, theoritically possible to access enum of another “application” (because, we use jarand not warclassloader), so in theory, I can load class from another apllication and access it’s enum constant from cache.

So, each “actual enum” extends BaseEnum and has public static finalvariables instead enum constant. Each enum constant has int data-member that represents int value in DB (I didn’t rely on the ordinal, I have explicitly defined value). In the code itself, I can write something like MyEnum.FIRST.db_value to convert enum to int value in DB. I also has some static method getEnumByDBValue() that converts int value in DB into enum (my original implementation uses additional HashMap for this, later I’ve released that it would be more efficient just to go over original cacheHashMap and find such enum).

It was possible not define enum constant at all. You can make call to DB to some table that holds all of your enum-constants, theire names, ordinal number and DB value. Now, you can have YourEnum that extends BaseEnum and your’re calling addEnum() method on it with the data fetched from DB. You just should be aware about initilazation order of your application, you shouldn’t use YourEnum untill call to addEnum() is done. To my surprise, this actually have worked without any problems.

In the code that use enum, I can’t use switch. Recall, that switch have worked back than only with compile-time constants, and my enum wasn’t compile-time constant, actually I intentially dynamically adds enum constants. So, I have to use if-else construct. Provided, that there was no built-in support for enum back than, it was totally fine.

Switch and enum

In JDK 1.4 when defining the cases for a switch statement, we need to adhere to the rules defined in the Java language specification:

  • The case labels of the switch statement require values are constant expressions.
  • No two of the case constant expressions associated with a switch statement may have the same value.

For example,

public static final int MAXIMUM_NUMBER_OF_USERS = 10;

All class constants of a primitive type are also compile-time constants. Strings are a special case, that will be discussed below. The Java compiler is able to calculate expressions that contain constant variables and certain operators during code compilation:

public static final int MAXIMUM_NUMBER_OF_GUESTS = MAXIMUM_NUMBER_OF_USERS * 10;
//public String errorMessage = ClassConstants.DEFAULT_USERNAME + " not allowed here.";

Expressions like these are called constant expressions, as the compiler will calculate them and produce a single compile-time constant.

A Java variable is a compile-time constant if it’s of a primitive type, declared final, initialized within its declaration, and with a constant expression. Strings are a special case, that will be discussed below.

The term compile-time constants include class constants, but also instance and local variables defined using constant expressions.

Not all static and final variables are constants. If a state of an object can change, it is not a constant:

public static final Logger log = LoggerFactory.getLogger(ClassConstants.class);
public static final List<String> contributorGroups = Arrays.asList("contributor", "author");

Though these are constant references, they refer to mutable objects.

So, my enum implementation, like TypeSafeEnum above are not compile-time constant and therefor can’t be used in switch at least in JDK 1.4.

But, what about Java’s JDK 5.0 enum? Whether it qualified for the usage in the switch?

Well, the requirement of the switch statement was changed in order to support this. Also, limitation on java.lang.Enum was imposed. Quote:

An enum declaration is implicitly final unless it contains at least one enum constant that has a class body (§8.9.1).

It is a compile-time error if the same keyword appears more than once as a modifier for an enum declaration…

An enum type has no instances other than those defined by its enum constants. It is a compile-time error to attempt to explicitly instantiate an enum type (§15.9.1).

https://docs.oracle.com/javase/specs/jls/se8/html/jls-8.html#jls-8.9

This enables to change the requirement for switch to be as following:

When defining the cases for a switch statement, we need to adhere to the rules defined in the Java language specification:

  • The case labels of the switch statement require values that are either constant expressions or enum constants.
  • No two of the case constant expressions associated with a switch statement may have the same value.

In the JDK implementation side, some desurgering process is done by compiler, that replaces enums constant with it’s ordinal and than makes switch on int. So, again, the Java compiler was changed to support switch on enum. Example:

https://alvinalexander.com/java/using-java-enum-switch-tutorial/

Because, of the limitations of java.lang.Enum (see above) compilers can replace case MONDAY with case 1, case TUESDAY with case 2,...,case SUNDAY with case 7. It can also “desurger” swith(theDay) to switch(theDay.ordinal),so actullay switch on int will be performed.

What about switch on String?

Well, here we can some twist in our story. Technically, constant expression can contain String. Strings are a special case on top of the primitive types because they are immutable and live in a String pool. Therefore, all classes running in an application can share String values.

Still, before JDK 1.7 it was forbidden to switch on String. Actually, the effort for implementing this feature inside JDK was pretty small provided we have “desugaring” process for the enum. New kind of “desugaring” was required, but basically that’s it.

Many people wanted to have switch on String, and waited for this feature, but when it was actually released it was mostly unnitocied. Why? Well, because you can mimic switch on String with switch on enum.

For example, let suppose you want to have switch on letters “A”, “B”, “C” and you’re using JDK 6.0. So, you can write somthing like:

Side note: I have write a lot of code like this. Even more, I’ve write just a few switch on Strings.

java.lang.Enum dynamic Enum?

In 2014 I have some small project to do. It was before microservices era, so I have 1 shared DB and bunch of WebServices. I wanted to implement some Locking mechanism to prevent of running some Web Services at the same time. So, essentially I have some Lock table in DB that essentially has list of Web Services. You can read the details here Lock High Level Design.

So, can I represent Lock as enum in the code?

Well, the list of Web Serivces is semi-static. It is not completely static, it may change over time, but the change is happening pretty slowely — you’re not adding WebServices each week (again, it is not mircoservice, it is full Web Services).

Quote:

Note 2021: We want the ability to dynamically add web-service and locks to the system. So, Lock can be Java’s enum class that contains all locks.
I do
prototype some solution to dynamically add instances to enum, but it just was one big hack that can breaks even on minor JDK update. Project Jigsaw was coming in JDK 9, so it was clear to me, that such hack creates significant technical debt.

https://alex-ber.medium.com/lock-high-level-design-3e02bbb8eb7f

So, while technically we can break Java’s enum inextensibility (that again, it was done intentially, see above) I have preferred to avoid this.

Sealed class (in JDK 15) will be better solution for the last use-case.

--

--