Java: A Child-First Class Loader

“photography of excavators at mining area” by Dominik Vanyi on Unsplash

Java is and has been undoubtedly the leading programming language for enterprise applications. I believe it is mainly because of the maturity of Java Virtual Machine (JVM). JVM is there for around more than 20+ years, and it is considered as one of the most reliable and optimized runtime engine ever created. In a nutshell, it has the most experience level among other application runtime engines.

Dealing with Java classloaders are considered only for advanced users due to their inherent complexity. Sometimes you hardly, or probably ever, has to deal with classloader workings in your coding career. So, what is actually a classloader?

Classloader loads necessary classes dynamically, but on-demand, into the JVM.

Why do you need to know about classloaders? When are they being used?

If you want to know more about classloaders and how they works, I recommend reading this wonderful article here.

There are two main scenarios I can think of where a user may need to change a classloader.

  1. Class Instrumentation: Instrumentation means modifying classes at runtime. This sounds scary, but very helpful in scenarios like, unit testing, debugging or monitoring java applications.
  2. Isolation of executions: Classloaders can be used to isolate several execution environments within a single process by making visible only a subset of classes for a particular thread.

Here, I am going to tell you a story about a reason we need to modify a classloader for the second point; Isolation of execution.

Recently, I was asked to create a small execution engine for a RPA tool. What it suppose do was to execute a specified method in uploaded jar files by user. Uploaded artifacts can take two forms. It can either be an uber jar (a big fat single jar containing your program and all its dependencies), or a zip file which contains the primary library plus all of its dependencies together in a folder.

That RPA application also has a set of base runtime libraries, we call it RPA runtime libraries, which suppose to be used by all user script libraries as a set of companion libraries. And those doesn’t need to be distributed along with user scripts, but a user has all the freedom to use any other external library dependencies, unconditionally. If they do use such custom libraries, then they must be packaged along with the final script bundle.

The challenge was that, so called RPA runtime libraries has its own dependency set as well. Such as, google guava, apache commons, etc. So, we need to carefully design the execution engine in a way that versions of custom script dependencies will not conflict with versions of runtime dependencies, and when such a version conflict occurs, script dependencies must need to take priority.

Now, this opens up a real interesting scenario in Java classloader world.

Generally, Java classloaders are hierarchical and working according to parent-first delegation model. (Here, hierarchical means not that kind of class level inheritance but an instance level delegation hierarchy) That means, when a class needs to be loaded, the classloader first delegates it to the parent classloader, and the parent delegates to the its parent, recursively, until the bootstrap classloader is hit. Ultimately, if any parent classloader does not find the specified class, then the current classloader will try to find it in its own context, usually in the file system, if you are using aURLClassLoader.

If we follow up this same strategy to the execution engine, it won’t work. Imagine, for example, a user script uses guava version 23 and our runtime libraries use guava version 18. In a parent-first classloader, user script will eventually use our guava v18, not v22, because by default, the parent-first classloader having RPA runtime libraries take the priority before the classloader having script artifacts. This is not the desired behavior. There are two things we could do for this scenario. Either, we should tell user to not to use conflicting versions of dependencies, or, we should fix the classloader by first giving priority to script dependencies. Clearly the first option is very restrictive and sometimes not possible at all due to how some scripts are working.

So, we decided to create a classloader, which having a reversed class loading order with compared to the Java’s default parent-first strategy. In this strategy, whenever a class needs to be loaded, it will first search in its own context and if not found, it will be searched in the parent context. And we can call it as a, child-first delegation. (literally, it is wrong to call it as child-first, because there is no child relationship being maintained. The name is taken by looking from the perspective of the parent classloader, which sees a child and it takes the precedence before coming it to the parent itself)

In child-first class loading strategy, it searches in its own classloader context before delegating it to the parent classloader.

And here’s how we did it.

Child-First Class Loader

We will create a classloader by extending URLClassLoader and we will override loadClass(String name, boolean resolve) method.

Why did we extend it with URLClassLoader? Because, we don’t want to reinvent the wheel to load classes from jars in the file system. All the scanning and heavy-lifting will be done by this class instead of us. We will inherit most of the implementations from this class except loadClass method.

public class ChildFirstClassLoader extends URLClassLoader {

public ChildFirstClassLoader(URL[] urls, ClassLoader parent) {
super(urls, parent);
}

@Override
protected Class<?> loadClass(String name, boolean resolve) throws ClassNotFoundException {
// has the class loaded already?
Class<?> loadedClass = findLoadedClass(name);
if (loadedClass == null) {
try {
// find the class from given jar urls
loadedClass = findClass(name);
            } catch (ClassNotFoundException e) {
// Hmmm... class does not exist in the given urls.
// Let's try finding it in our parent classloader.
// this'll throw ClassNotFoundException in failure.
                loadedClass = super.loadClass(name, resolve);
}
}

if (resolve) { // marked to resolve
resolveClass(loadedClass);
}
return loadedClass;
}

If you looked at this code very carefully, you could see that this implementation is almost similar to the implementation of loadClass method in the Java ClassLoader class. However, the order is different, as obvious. We have reversed it.

And also note that the line loadedClass = super.loadClass(name, resolve). Remember, I mentioned above that class loading is not a class level inheritance but an instance level delegation. Then why did I not use the parent classloader reference, instead I have used loadClass of the super class? Because, if you observed the loadClass method of inherited URLClassLoader you can see it has actually the parent-first delegation by using the provided parent instance. And also, it calls findClass method again, if the class is not found any of parent. At very first look, this is inefficient, but the scenario of it invoking is the scenario where actually our requested class is not found in the classpath at all! So, it is alright to be inefficient when actually the program suppose to be fail in the next tick due to the class is being not found.

You have every right to be confused with so many methods having similar meaning among themselves, like loadClass, findClass, resolveClass , findLoadedClass. It all sounds like they are synonyms, right? Don’t worry. Though the names sounds similar, they are being used in different phases of class loading strategy. Let me elaborate what those methods do in simple words.

  • findLoadedClass : This method actually check whether class has already been loaded previously, if so, it returns the loaded class and, if not, it returns null. We need to check this to prevent loading same class again and again, very expensively, every time it requires in a program sequence. Just think that this method acts like a cache.
  • loadClass: This method is the actual orchestrator of how should the class load in to the program. It defines the order of strategy. That’s why we need to override this to achieve what we want.
  • findClass: This method will actually find the specified class simply by scanning inside jar files. This method is the most expensive one, because it involves I/O activities to find a class from files and as well as call to a method called defineClass which converts some byte-codes into a representation of actual java Class<?> instance. One need to prevent calling this unnecessarily considering performance, as why we must call findLoadedClass and check before invoking this method.
  • resolveClass: If the boolean parameter, resolve, to loadClass has been set to true then, it will try to load all the classes referenced by this class. After that, it verifies the correctness and compatibility of the loaded class bytes. If any of the above steps fail, it will throw any subclass of LinkageError. Usually the resolve parameter will always be true when called by user, but within recursive calls, it will be passed as false, because the class is being specified has already been resolved.

See, they have different responsibilities regardless of their similar meaning. If you were the developer, who suppose to create the ClassLoader class for java, can you imagine giving more differentiated names to above method names?

Now you can use this classloader in your execution environment. In our application, we were executing all scripts in separate threads. So, we can set this new classloader as our thread context class loader, as shown in below.

public void beginExecution(Runnable runnable) {
Thread t = new Thread(runnable);
URLClassLoader runtimeCl = new URLClassLoader([...runtimelibs], null);
t.setContextClassLoader(new ChildFirstClassLoader([...scriptlibs], runtimeCl));

t.start();
}

This is not the best way to start a thread. Above is shown for a demonstration purpose only. Rather, we used a thread pool to submit and execute our threads and had our custom thread implementation to control many things in a thread execution lifecycle.

public class Worker implements Runnable {

private ClassLoader runtimeClassloader;

@Override
public void run() {
ClassLoader ctxCL = Thread.currentThread().getContextClassLoader();
Thread.currentThread().setContextClassLoader(
new ChildFirstClassLoader([...scriptlibs], runtimeClassloader));

try {
// execute script
} finally {
Thread.currentThread().setContextClassLoader(ctxCL);
}
}
}
...
POOL.submit(new Worker(runtimeClassLoader));

You might be asking why we decided to set context classloader to the previously kept reference, ctxCL, at the end. It is because the nature of our thread pool is to reuse existing threads as much as possible (*wink*), and having so we did not take a chance to leak our classloaders after the usage too. So, we explicitly preserved the original classloader, which was setup by the runtime, at the end of the execution.

Does it work?

Wait. Doesn’t it work?

I could say that above class works……but barely. So, what’s wrong with it?

If you observed carefully, you know that I deliberately dropped some implementations from above class.

Yes, it’s true that it loads classes, but what about resources?

Resources are important as classes as well. You can’t have an implementation where class loading strategy is different from its resource loading strategy. If those two strategies are different, you gonna end up in a debugging hell, finding what went wrong! (I am talking about config files, meta files and other classpath embedded resources). Believe me, I was there once.

Some classes are associated with resources. Considering the above classloader, the resources are loaded as parent-first strategy, then what you going to end up is classes with mismatched resources, thus failing the whole program. So, we need to override resource loading methods too as shown in below.

public class ChildFirstClassLoader extends URLClassLoader {

...


@Override
public Enumeration<URL> getResources(String name) throws IOException {
List<URL> allRes = new LinkedList<>();

// load resource from this classloader
Enumeration<URL> thisRes = findResources(name);
if (thisRes != null) {
while (thisRes.hasMoreElements()) {
allRes.add(thisRes.nextElement());
}
}

// then try finding resources from parent classloaders
Enumeration<URL> parentRes = super.findResources(name);
if (parentRes != null) {
while (parentRes.hasMoreElements()) {
allRes.add(parentRes.nextElement());
}
}

return new Enumeration<URL>() {
Iterator<URL> it = allRes.iterator();

@Override
public boolean hasMoreElements() {
return it.hasNext();
}

@Override
public URL nextElement() {
return it.next();
}
};
}

@Override
public URL getResource(String name) {
URL res = findResource(name);
if (res == null) {
res = super.getResource(name);
}
return res;
}

There are two methods you need to override, one to load a single resource and other to load multiple resources once. Why there are two methods? Reason is that, sometimes classes may need to get a single definite resource, and some may need all resources indicated by a given name, which could be contributing from various artifacts. However, remember that you must override both since we, in advance, have no idea which method will be called by libraries at runtime.

And also, when we return an Enumeration<URL> within getResources method, we preserve the searched order within list as well.

Now, the resources are loading and it should work.

Since we have extended our ChildFirstClassLoader from URLClassLoader we never wanted to override findClass or findResource or findResources methods, because all have been implemented in the super class. If you have extended this from ClassLoader abstract class, you have to implement these methods as well.

What could possibly go wrong with Child-First class loading strategy?

By looking at this strategy you might think, why didn’t creators of Java think about making this as its default way of loading classes. I mean, Why? Did they toss a coin and decided parent-first delegation strategy to be used?

No. Actually there is a huge reason even creators of Java language considered about. In fact, if you find carefully through JDK internals, you can’t find a single class loader implementing this strategy. Why? The reason is, security.

Just imagine this. In a child-first class loading environment child gets the priority. Sometimes, even a user could impersonate some internal classes in extension or system classloader and therefore it can have very undesirable effects in terms of security of your application. Simply, by loading your own set of classes, you may be able to simulate non-secure environment while your program is executing and completely crash your application’s JVM. Although, normal users won’t deliberately try those things, you know, some accidents could happen. We needed to prevent this.

So, considering security, this is not a perfect solution for our problem. How can we improve it, then?

A better Child-First solution

To prevent such impersonations being happening, and to protect some of our precious runtime libraries, we can modify our class loader little bit differently than it is currently now.

Here we will create an instance of ChildFirstClassLoader by providing another class loader. So, what class loader can we give it? We can give system class loader by calling getSystemClassLoader() within the constructor. Later in other methods, this referenced class loader will be used first to find class and, if not loaded, it will use child-first strategy to search itself and then parent, likewise. See below.

public class ChildFirstClassLoader extends URLClassLoader {

private final ClassLoader sysClzLoader;

public ChildFirstClassLoader(URL[] urls, ClassLoader parent) {
super(urls, parent);
sysClzLoader = getSystemClassLoader();
}

@Override
protected Class<?> loadClass(String name, boolean resolve) throws ClassNotFoundException {
// has the class loaded already?
Class<?> loadedClass = findLoadedClass(name);
if (loadedClass == null) {
try {
if (sysClzLoader != null) {
loadedClass = sysClzLoader.loadClass(name);
}
} catch (ClassNotFoundException ex) {
// class not found in system class loader... silently skipping
}

try {
// find the class from given jar urls as in first constructor parameter.
if (loadedClass == null) {
loadedClass = findClass(name);
}
} catch (ClassNotFoundException e) {
// class is not found in the given urls.
// Let's try it in parent classloader.
// If class is still not found, then this method will throw class not found ex.
loadedClass = super.loadClass(name, resolve);
}
}

if (resolve) { // marked to resolve
resolveClass(loadedClass);
}
return loadedClass;
}



@Override
public Enumeration<URL> getResources(String name) throws IOException {
List<URL> allRes = new LinkedList<>();

// load resources from sys class loader
Enumeration<URL> sysResources = sysClzLoader.getResources(name);
if (sysResources != null) {
while (sysResources.hasMoreElements()) {
allRes.add(sysResources.nextElement());
}
}

// load resource from this classloader
Enumeration<URL> thisRes = findResources(name);
if (thisRes != null) {
while (thisRes.hasMoreElements()) {
allRes.add(thisRes.nextElement());
}
}

// then try finding resources from parent classloaders
Enumeration<URL> parentRes = super.findResources(name);
if (parentRes != null) {
while (parentRes.hasMoreElements()) {
allRes.add(parentRes.nextElement());
}
}

return new Enumeration<URL>() {
Iterator<URL> it = allRes.iterator();

@Override
public boolean hasMoreElements() {
return it.hasNext();
}

@Override
public URL nextElement() {
return it.next();
}
};
}

@Override
public URL getResource(String name) {
URL res = null;
if (sysClzLoader != null) {
res = sysClzLoader.getResource(name);
}
if (res == null) {
res = findResource(name);
}
if (res == null) {
res = super.getResource(name);
}
return res;
}
}

This has a potential advantage as it is guaranteed that internal classes are protected and will not be impersonated as it is first looking in the system class loader.

You can make this even more extensible by providing the referenced classloader as a constructor parameter. So, it is customizable at runtime when the classloader is initiated.

There is another way of doing it. You can keep a list of package prefix names within a class loader, namely a protected list. So whenever a class needs to be loaded, it checks whether it contains in the protected list, if so, it will delegate parent-first strategy and otherwise, child-first strategy. Disadvantage is that you better know every package names to be protected in advance, which is not developer friendly and we could miss or we could add some classes which we don’t want.

Therefore I suggest using child-first using a referenced classloader instead of some package pattern matching mechanism.

Conclusion

Although you have used child-first strategy in your classloader, does not mean that every parent classloader is doing child-first strategy, because you did. Strategies of all your parents will still be parent-first unless your parent classloaders are written by yourself as child-firsts. Any Java classloader is strictly parent-first. You can’t modify that. What you can do is you can change parents as you desired. You can set parent to null, if you want to completely isolate classloader with other classloaders. Setting parent to null means it will automatically set to bootstrap classloader which is written in native code.

The child-first class loaders have its own pros and cons. You might have to use it in some scenarios, like we had to face. But importantly, you should take a decision with your tech leads before using this strategy or any other alternatives. And also, before going for the implementation part, I suggest you go and read about classloaders and how they work in more detail manner. Because working with classpath and classloaders are actually harder and more confusing.