Uninitialized Behavior in Java (and Thoughts for Other Programming Languages)

Joseph A. Boyle
5 min readMay 20, 2018

Quiz time: what happens when you declare a variable in Java, but don’t initialize it, and then go on to use the variable? Whether you guessed the variable will be initialized to zero or null, or that the compiler will throw an error, you’re only right sometimes.

The code sample on the left half will generate a compiler error, whereas the code sample on the right works fine and prints 0.

The documented behavior is that class fields are given a default value for each instance of the Object, whereas “the compiler never assigns a default value to an uninitialized local variable”, going so far as throwing a compile-time error when variables are not instantiated. Simply being documented behavior does not, however, make this a logical.

I teach Java to beginner programmers at my University. We take for granted the learning experiences we’ve had in programming, and for that reason, I tend to like to understand why we make the choices that we do in the design of the tools we give to unsuspecting college freshmen.

It’s easy to tell a student that you should initialize all of your variables, or else bad things will happen and the compiler will throw a fit. What isn’t easy, though, is explaining why we lied when they inevitably make their first class, don’t initialize a variable, and the world doesn’t blow up, or when they make their first array and ponder why it has these magical default values, but other variables don’t.

The How and the Why of Default Values

Experienced programmers have had it beaten over their heads to always initialize their variables, which is very useful for those who have used C or an equivalent systems-level language. In designing a compiler, one must be aware of the performance trade-off that a default value may present: if there are n uninitialized local variables, setting them all to a default value requires at least n instructions to actually store in memory. For system languages like C, this decision is easy — it’s far better to require the user to explicitly set the memory to what they want, rather than possibly storing a useless value.

In Java, local variables are stored in what is called a stack frame, which for simplicity sake we will say looks roughly like this:

In this example, foo and bar each have their own stack frame. Bar’s return address the instruction address at which foo called bar, and each of foo and bar will store their own local variables in the “local variables” portion.

To create this stack frame, you must request a sufficient amount of memory from the Operating System, and then begin filling in the data that you know — the return address, the current frame pointer, etc. Variables are stored at some position in the “local variables” section, with their value being the real value of the variable.

When you request memory from the Operating System, it returns a block of memory that some other program may have previously used, without making an attempt to clean it up. That is, if the last person to use your memory wrote the value 5 to the first byte, it will still have the value of 5 when you get it. This is, then, where we run into issues with default values — the space where “local variables” go is going to be occupied by whatever used to be in the memory you requested. For the compiler to set each variable to a default value, it would need to then iterate through the stack frame and set the bytes to zero.

In Java, arrays and Objects both are stored in variables as references (that is, the addresses that their actual data lives at), with their actual memory being a block of the appropriately requested size, which is then overwritten with zeros. That is, it’s not up to the constructor to actually zero-out data in an Object, it’s done for you once the memory is requested. Why don’t we treat Objects and arrays in the same way that we do local variables, then?

I explored this design decision quite a bit, and came up with a lot of unsatisfying arguments, primarily centered around the idea that we can’t deduce when various methods in a class will be called, and therein can’t know if a variable will be referenced before its declaration (imagine calling a getter before a setter, or a setter before a getter). This leaves an unsatisfying taste in my mouth, though, because the ambiguity is equally as valid in a single method:

IO.readInt() will read an integer as input from console and return it. In this case, we don’t know whether we will do a read or a write first until runtime, which is equivalent to as in objects.

The only other answer that I can think of is pure speed, but I’m not sure that’s an appropriate answer for languages like Java, either. It is important that we think about how the average programmer works with a language — in my experience reading other people’s code, I’ve noticed two things: 1) classes generally have many (read: too many) fields, and 2) people tend to enjoy creating many more objects (read: ButtonFactoryFactories, BananaFactorySingleton, the works) than they need. With this in mind, let’s consider that performance hit again: if the number of variables associated with all of the objects created during the duration of the program are greater than or equal to all of the local variables from every function call, you’ve already agreed to take a performance hit. Numerically, if there are c objects created, with an average of n fields each, and there are d function calls, with an average of m variables each, we are comparing c*n and d*m bytes of memory that need to be zeroed out.

Closing Thoughts

There isn’t a one-size fits all solution to the question of “do we use default values” across all languages. Surely, this should be decided on a case-by-case basis. The main frustration is languages which only get their toes wet — if a language should set some values by default, it ought to set all values by default.

The bottom line, I feel, is this: if you decide that your language should to initialize object and array memory blocks to zero, it’s only natural to go all-in and initialize local variables to zero as well. I’m just not buying that zeroing out a stack frame for the local variable storage is too expensive when we already happily do it in other cases that, for some programmers, are abused more often than the former.

I appreciate you taking the time to read this article. If you have any questions or have any documentation for this design choice beyond what I’ve discussed, please feel free to join in the discussion below.

--

--

Joseph A. Boyle

Senior Software Engineer at Dandy. Rutgers University Computer Science Alumni. Lover of low-level systems, vintage computers, compilers, and 3d printing.