Mirandas, bridges, overpasses

Pietro Braione
15 min readSep 7, 2018

--

As the Java programming language and the JVM specification evolved in time, the need arose to make old and new bugs and features live together in harmony. Resulting from efforts to solve this recurring issue, a number of exotic, and often ill-documented entities appeared from time to time, puzzling whoever wants to, or needs to, delve in the gory details of what happens behind the scene when a piece of Java software is run. In this article I will discuss three kind of these entities, namely, mirandas, bridges and overpasses. These are artificially generated methods aimed at handling some situations that arise when linking and invoking other, user-defined methods. I will refer mostly to Java (and OpenJDK) 8.

Let us start with miranda methods. The Miranda warning states, among the other things, that if you cannot afford a lawyer, one will be appointed to you. Similarly, a miranda method is a method for which the language runtime provides a default definition to all classes that do not define it on their own. For example, the E programming language has a so-called miranda protocol made of a set of methods that all the objects receive together with a default implementation. For what concerns OpenJDK the motivation for miranda methods is quite different. Long before Hotspot there was the Sun JVM. The 1.1 version had a bug — when deciding whether a class declares a method, the Sun JVM walked the superclass hierarchy, but not the superinterface hierarchy. What were the consequences? Let us suppose that we have the following interface:

public interface A {
void a();
}

and the following abstract class implementing it:

public abstract class B implements A {
/* nothing here */
}
public class C extends B {
void a() { /* some code */ }
}

Because of the bug the JVM would not figure out that B has an abstract method a(). Thus this code:

public class Main {
public static void main(String[] s) {
B b = new C();
b.a();
}
}

would be problematic. I did not have the opportunity to experiment with the JDK 1.1 but I reckon that this would happen: When javac would compile the statement b.a(); of the Main.main(String[]) method, it would determine that the variableb has (static) type B and therefore would emit the bytecode instruction invokevirtual B.a()V. When executing this bytecode, the Sun JVM would try and resolve the symbolic reference B.a()V, but because of the bug it would erroneously conclude that B has no a() method, therefore failing. To rescue the situation, the javac compiler can be instructed to emit miranda declarations for the abstract methods that a class inherits from a superinterface, and that the class (or one of its superclasses) does not redeclare or implement. By default current version 8javac implementations do not emit miranda methods, thus if you javac B.java the resulting Java 8 class will have this shape:

$ javap -cp . B
Compiled from "B.java"
public abstract class B implements A {
public B();
}

but if you javac -source 1.2 -target 1.1 B.java then

$ javap -cp . B
Compiled from "B.java"
public abstract class B implements A {
public B();
public abstract void a();
}

Voici a miranda method declaration for a()! The miranda declaration has the public, abstract, and synthetic flags set in the classfile. Sooner or later javac will no longer support ancient versions of the JDK/JRE, but since ancient jars compiled for that platform will still exist, miranda declarations will be here to stay.

There is not much documentation on miranda method declarations, the most complete to date being comments in the JDK source code. Much better is the situation for bridge methods, that have a dedicated section in the Java tutorial. Bridge methods are necessary because according to the JVM specification a method in a subclass overrides another one in a superclass only if the former has exactly the same signature (name and parameters/return value types) of the latter. This badly interacts with another feature that was added to the JVM specification much later, that is, generic types and the fact that genericity is implemented by means of type erasure for backwards compatibility reasons. Consider for instance this generic class:

public class Sup<A> {
void m(A a) { /* some code */ }
}

and its subclass:

public class Sub extends Sup<Integer> {
void m(Integer a) { /* other code */ }
}

Since Sub.m(Integer) has same signature as its superclass method Sup<Integer>.m(Integer) the former should override the latter, right? Well, let’s perform type erasure and see. Afterwards the two classes become:

public class Sup {
void m(Object a) { /* some code */ }
}
public class Sub extends Sup {
void m(Integer a) { /* other code */ }
}

Now Sub.m(Integer) has no longer same signature as Sup.m(Object) and therefore no longer overrides it. This means that the following code:

Object a = Integer.valueOf(0);
Sub s = new Sub();
s.m(a);

would invoke Sup.m(Object), not Sub.m(Integer) as one would expect. To avoid this issue javac adds a method that overrides Sup.m(Object) and invokes Sub.m(Integer):

public class Sub {
void m(Integer a) { /* other code */ }
void m(Object a) { m((Integer) a); } //the bridge method
}

This method is called a bridge method, and thanks to it invoking s.m(a) now has the expected effect. Another issue motivating bridge methods is the fact that programming languages theory tells us that in an object-oriented programming language a method in a subclass may override another one with same name in the superclass if the former is countervariant in the types of the parameters of the latter (“require no more”) and covariant in the type of the return value of the latter (“ensure no less”). This is less restrictive than the overriding semantics prescribed by the JVM specification where — we remember — the signatures must be identical, so what if we want to define our own programming language, whose compiler targets the JVM, and we want it to follow the less restrictive overriding rules allowed by the theory? This is not a corner case: Since version 5.0 the Java programming language allows covariance of return values, thus if I have the following classes:

public class Sup2 {
Object k() { /* some code */ }
}
public class Sub2 extends Sup2 {
Integer k() { /* other code */ }
}

for the Java Language Specification v.8 it is the case that Sub2.k() overrides Sup2.k(), while for the JVM specification v.8 it is not. To overcome the semantic gap javac generates a bridge method in Sub2 that overrides Sup2.k() and invokes Sub2.k():

public class Sub2 {
Integer k() { /* other code */ }
Object k() { //the bridge method
invokevirtual Sub2.k()Ljava/lang/Integer;
}
}

Bridge methods have both the synthetic and the bridge flag set in the classfile. They are notoriously a very bad solution to the problems they aim to solve. One of the issues they rise is the fact that bridge method stack frames pollute the call stack. As a result call stacks become less understandable, and debugging is disturbed. Another issue is that if you modify a class you might need to recompile all its subclasses in order to generate bridges in them. This makes harder the evaluation of the impact of a modification to a class on a Java project, and thus complicates the implementation of incremental build tools like Ant and Maven, or of IDEs like Eclipse.

Finally, overpasses. While mirandas were introduced to fix a bug in ancient JVM implementations, and bridges were introduced to fix mismatches in method overriding semantics, overpass methods come into play when implementing the semantics of default method invocation. Overpasses are not something that you find in a classfile as mirandas and bridges. Instead, they are internally generated by Hotspot when it loads and links a class, and they are added to the internal representation of the class that Hotspot keeps in its memory. Overpasses are therefore something very specific to Hotspot: A different JVM could implement default method invocation differently, and thus could have no overpasses. What follows is what I managed to understand about them — better, what I believe I managed to understand — by reading the source code of Hotspot. I cannot ensure that I didn’t get anything wrong. If you know better feel free to contact me and correct my mistakes. And now get ready to dive in the depths of the source code of one of the most complex pieces of software around. Nevermind, it will be a quite interesting journey.

To talk about overpasses we need to talk how virtual method invocation is implemented in Hotspot. Before Java 8 and the advent of default methods, the object model prescribed by the JVM specification was very simple: Only (noninterface) classes contained method code, while interfaces might only contain abstract method declarations, thus no code. Since a class can have only one superclass, performing a virtual method invocation (invokevirtual <signature> bytecode) was relatively easy: First look whether the class of the this object contains code for a method with signature <signature>, if none is found look in its superclass, then in the superclass of its superclass, and so on recursively. If you find an implementation, good, invoke it. If you hit the java.lang.Object class without finding an implementation, throw an AbstractMethodError. This lookup procedure is simple but still has a cost that is, in the worst case, linear in the depth of the class hierarchy, so Hotspot does it best to perform it as few times as possible. Indeed, it exploits an approach that is quite customary in the implementation of object-oriented languages, that is, it precalculates virtual method lookup upon class loading and stores the result into data structures called the virtual method tables, or vtables. When Hotspot creates a new Java object, it stores in its memory a hidden pointer to a (C++, since Hotspot is written in C++) object that represents its class. This C++ object is of class ArrayKlass if the Java object is an array, or InstanceKlass if it isn’t. We are not really interested in arrays because they cannot add or override methods to their superclass java.lang.Object, so we will only consider InstanceKlasses. Every InstanceKlass object contains information about a Java class as, e.g., whether the class is abstract, whether it is an interface, whether it is final, how many fields it declares, what is the name and the visibility of each field, etc. All this information is read from a classfile. Moreover the InstanceKlass stores a vtable, that is essentially a (C++) array of pointers to (C++) Method objects. A Method object represents a method declaration in a Java class, it is also built taking information about the method from the classfile, and contains, e.g., the name of the method, its descriptor (i.e., the type of its parameters and of the return value), whether the method is declared abstract or not, and in the latter case a pointer to an array containing the bytecode of the method. When Hotspot reads a classfile and creates its corresponding InstanceKlass object, it builds the vtable as follows:

  • First, it clones the vtable of the superclass;
  • If the class overrides some methods of its superclass, it updates the corresponding slots in the cloned vtable so they point at the Method objects of the overriding methods;
  • Finally, if the class declares some additional methods, it adds slots for them after the end of the vtable, and populates the slots so they point at the appropriate Method objects.

Thanks to this procedure, vtables enjoy an important property: The position of the slot reserved to a method is the same in all the vtables of all the subclasses of any class. Let us consider for instance the following class declarations:

class A {
void b() { /* some code for b */ }
void c() { /* some code for c */ }
}
class B extends A {
void a() { /* some code for a */ }
@Override void b() { /* some different code for b */ }
}
class C extends A {
@Override void c() { /* some different code for c */ }
void d() { /* some code for d */ }
}

When loaded into Hotspot, they might generate the following (C++) data structures:

(note that this is a simplified rendition for the sake of presentation: The actual objects, pointers, and field names used by Hotspot are quite different, and slightly more complex). As you can see, all the vtables have method b() at slot 0 and method c() at slot 1.

Thanks to this property, it is easy to determine which method must be invoked when an invokevirtual bytecode is met: Take the InstanceKlass of the this object, and look at the right slot in its vtable. The slot position can be determined once for all when resolving the symbolic argument of aninvokevirtual bytecode instruction the first time it is executed. If for instance we have:

class Main {
public static void main(String[] s) {
A a = new /* either A or B or C, who knows */;
a.b(); //compiled to: invokevirtual A.b()V
}
}

the symbolic resolution procedure of the invokevirtual symbolic argument A.b()V will query the class A to determine the slot position of the method b() in the vtable, 0 in our example, and associate the position to theA.b()V symbol in the Main class. From hence there is no need to resolve the A.b()V symbol in the Main class anymore, as Hotspot knows that this symbol corresponds to slot 0 in the vtable of the this object. By virtue of the fact that classes A, B and C all have pointers to the code of the method b() at position 0 in their vtables, the correct method will be invoked whichever object a will point to at runtime. As a consequence, after the first execution of an invokevirtual bytecode, where resolution of the symbolic argument must be performed, virtual method invocation needs only to perform a constant and low number of memory accesses to pick the right method, a number that in the case of Hotspot is three, as this page states (this is the code).

Note that also abstract classes must have vtables, and the abstract methods declared in them must have a vtable slot. Why? Because, had the abstract class no vtable, or had the vtable no entry for the abstract methods, different subclasses would be free to put the slots for the methods declared in the abstract superclass at different positions in their respective vtables, breaking the invariant that makes the vtable machinery work. Consider for instance the following variation on the previous example:

abstract class A {
abstract void a();
void b() { /* some code for b */ }
void c() { /* some code for c */ }
}
...

Now the vtables of B and C must have a slot for a() at the same position, or the execution of Main.main(String[]) would not be correct. In an abstract InstanceKlass the vtable slots for abstract methods are filled with a pointer to a specialMethod object with no code. If a subclass does not provide an implementation (in our reworked example, class C does not provide any implementation for the abstract method a()), it will inherit the code-less Method object from the abstract superclass. When we try to invoke it, an AbstractMethodError is thrown. If you are interested to know the nitty-gritty details of how this works, consider that Hotspot executes an invokevirtual bytecode by getting the Method object from the vtable, and then jumping to one of two blocks of code stored in it that are called the entries. A Method object stores two different kind of entries, and only one is invoked depending on whether the invokevirtual bytecode is executed by the interpreter or by JIT-ted binary code. In the former case, the interpreter jumps to Method::_from_intepreted_entry, while in the latter case the JIT-ted binary code jumps to Method::_from_compiled_entry. If the target method is JIT-compiled, _from_compiled_entry and _from_interpreted_entry just perform another jump to the first instruction of the binary code of the target method. Otherwise, _from_compiled_entry and _from_interpreted_entry point to trampoline functions that set the interpreter to push a frame for the invoked method on the stack, and then starts interpreting the invoked method. But if the Method object is code-less, _from_compiled_entry and _from_interpreted_entry point to trampoline code that instructs the interpreter to throw an AbstractMethodError.

We told that abstract methods in classes must have vtable entries — actually all the abstract method of the class must have a vtable entry, not just the ones that the class declares, but also all the ones that the class inherits from some interface. If in our previous example we had:

interface I {
void a();
}
abstract class A implements I {
...
}
...

the situation would not be so different, thus the same reasons holds why class A must have a vtable slot for a(). These slots are called, guess how? Miranda slots. So there is another Miranda law for JVM classes: Either an abstract interface method has a vtable slot in a class implementing the interface by virtue of the fact that the class redeclares or implements the method; Otherwise, the JVM grants one vtable slot to the method. Moreover, as it happens for abstract methods, the vtable slot is populated with a code-less Method object. In our previous example, the vtable of class A has a miranda slot for a() that is populated with a code-less Method object whose entries throw an AbstractMethodError.

To synthesize, the revised vtable creation procedure for a class is as follows:

  • Hotspot determines the superclass and creates an initial vtable by cloning the one in the superclass;
  • It parses the method declarations in the class, either abstract or not; it creates a code-full or code-lessMethod object for each of it; If a vtable slot for a declaration already exists, it sets them with the corresponding Method object; Otherwise, it appends a new vtable slot and sets it with the Method object;
  • It walks the superinterfaces of the class, and if it finds an abstract method declaration that has not already a slot in the vtable, it adds a miranda slot for it and sets it with a code-less Method object.

(note that this is another simplification for the sake of presentation. Actually, parsing of classfiles happens at a different time from the one where the vtable is created, that is the linking phase).

So this was the situation until Java 8. After Java 8 method code does not exclusively live in classes any more, but can also be in interfaces in the form of default methods. This forced to rethink how virtual method dispatching should be performed. Of course, the idea was reusing as much as possible the machinery based on vtables that was already in place. So why not exploiting the already existing miranda slots that are created for the methods inherited from superinterfaces? Let us consider this example:

interface I {
default void a() { /* some default code for a */ }
}
class A implements I {
void b() { /* some code for b */ }
void c() { /* some code for c */ }
}
...

Were method I.a() abstract, the vtable of class A would receive a miranda slot for it. So the solution to implementing default method is to create the vtable slot as if I.a() were abstract, but instead of making it point to a code-less Method object, make it point to a Method object containing the code of the default implementation in I. Problem solved!

Well, not completely. The rules for method lookup also change with the introduction of default method and become way more complicated. Now that interfaces can also contain methods with code, to look for a method implementation it is necessary to walk not only the superclass, but also the superinterface hierarchy, that is not a tree but a generic acyclic graph. Before Java 8 method lookup could have only two possible results: Either exactly one method implementation is found in the superclass hierarchy that overrides all the others, or none is found. With Java 8 the possible situations are three: the previous two, plus a third one where we find two or more different implementations of a method, with none of them overriding the others. Consider for instance this further variation on the previous example:

interface I1 {
default void a() { /* some default code for a */ }
}
interface I2 {
default void a() { /* some other default code for a */ }
}
class A implements I1, I2 {
void b() { /* some code for b */ }
void c() { /* some code for c */ }
}
...

Should an instruction (new A()).a() be executed, what default implementation should be invoked, either I1.a() or I2.a()? The answer is: none of them. The JVM specification requires that the lookup procedure finds exactly one maximally-specific (i.e., not overridden) method implementation. If none is found, an AbstractMethodError is thrown, in agreement with previous JVM specifications. If more than one is found, an IncompatibleClassChangeError must be thrown.

So, with the introduction of default methods, lookup of a method implementation may fail in two different ways. How does Hotspot manage this situation? Enter overpass methods: An overpass method is a code-fullMethod object created by Hotspot, whose code either throws an AbstractMethodError or throws an IncompatibleClassChangeError. The procedure for creating the vtable for a class is modified as follows:

  • Hotspot determines the superclass and creates the initial vtable by cloning the one in the superclass, exactly as before;
  • Exactly as before, it parses the method declarations in the class and modifies/adds slots to the vtable for them;
  • It walks the superinterfaces of the class, and if it finds an abstract or default method declaration that has not already a slot in the vtable, it adds a slot for it and initializes it with a code-less Method object;
  • For all the empty (i.e., pointing to code-less Methods) slots in the vtable, it walks the superinterface hierarchy and determines whether there is exactly one, none, or more than one maximally-specific default implementation for the associated method signature. In the first case, it sets the slot to a code-full Method object for the unique implementation. In the second case, it creates an overpass Method object whose code throws an AbstractMethodError, and sets the slot to this object. The third case is similar to the second one, with the difference that the created overpass throws an IncompatibleClassChangeError. Note that a vtable slot that inherits an overpass Method object from a superclass must be considered empty for this algorithm to work correctly.

An overpass method has the public, synthetic, and bridge flags set. I am not sure why the bridge flag is set, especially in light of what we told before about bridge methods. If anyone has any idea, or can elaborate about the relationship between overpasses and bridges, please write me.

--

--

Pietro Braione

I am an associate professor at the University of Milano-Bicocca. My interests are programming languages, program analysis, software quality and security.