Measuring Java Energy Consumption

Published in

Growing Green Software

12 min read6 days ago

In a previous post, we discussed software efficiency and ways to measure the energy consumption of software. We used the Pinpoint command line tool to measure the energy consumption when compiling a Java application. This probably wasn’t the most exciting demonstration, but it showed that we can measure the energy consumption of running software. In this post, we will dig deeper and measure the energy consumption of Java programs on an individual method level.

There are several measurable aspects of software, such as execution time, memory usage, network performance, test coverage (how much of our software is covered by automated tests), and energy consumption. These are all runtime measurements that can only be collected by observing the running software. There are also several static measurements of software that we can get without running it, such as adherence to programming language style guides, potential bugs, architectural conformance, or source code complexity.¹

With the current state of the art, we cannot measure software's energy consumption without also running it. There are just too many variables that influence the energy consumption of software, such as the hardware it runs on, the operating system, and so on. If we make enough assumptions, it might be possible to estimate energy consumption statically, but that’s something to explore in the future.² So, let’s get back to runtime measurements, specifically measuring the energy consumption of Java software.

Why Java? I’ve spent the last twenty years programming in Java and related languages (Scala, JRuby) and am most familiar with its ecosystem. It’s also widely used in industry. As an added challenge, measuring it is not as straightforward as measuring C or C++ software. Java runs on a virtual machine, the Java Virtual Machine (JVM), which abstracts the hardware and operating system from the software. Virtual machines typically use Just-In-Time (JIT) compilation,³ which might compile the Java byte code to machine code during runtime and optimize it for the specific hardware as it runs. There also is a Garbage Collector (GC) that reclaims memory by clearing unused objects. The GC can cause unpredictable pauses, affecting the consistency of performance measurements.

Java Microbenchmark Harness

When it comes to measuring the performance of Java applications, one of the most widely used tools is the Java Microbenchmark Harness (JMH). JMH is a versatile benchmarking framework that allows developers to write and execute micro-benchmarks to evaluate the performance of specific methods. JMH provides a standardized and reliable way to conduct performance tests, ensuring accurate results. It also takes care of “warming up” the virtual machine, meaning that it runs the code a few times before measuring it to give the virtual machine a chance to optimize the code. While garbage collection pauses can still affect the results, JMH comes with a GC Profiler to help you understand what is happening under the hood, for example, to detect memory leaks.

As an example, let’s compare two different list data structures: ArrayList and LinkedList and measure the time it takes to add different amounts of random numbers to a list. A JMH benchmark to measure the performance of these operations might look like this:

@BenchmarkMode(Mode.AverageTime)
@Measurement(iterations = 5)
@Warmup(iterations = 5)
@OutputTimeUnit(TimeUnit.MILLISECONDS)
@Fork(value = 1)
@State(Scope.Benchmark)
public class CollectionAdd {

    @Param({ "100000", "200000", "500000", "1000000" })
    private int collectionSize;
    private int[] valuesToAdd;

    @Setup
    public void iterationSetup() {
        valuesToAdd = new int[collectionSize];
        Random rand = new Random(9032);
        for (int i = 0; i < collectionSize; i++) {
            valuesToAdd[i] = rand.nextInt(collectionSize);
        }
    }

    @Benchmark
    public List<Integer> addToJavaArrayList() {
        var result = new ArrayList<Integer>();
        for (int i = 0; i < valuesToAdd.length; i++) {
            result.add(valuesToAdd[i]);
        }
        return result;
    }

    @Benchmark
    public List<Integer> addToJavaLinkedList() {
        var result = new LinkedList<Integer>();
        for (int i = 0; i < valuesToAdd.length; i++) {
            result.add(valuesToAdd[i]);
        }
        return result;
    }

    public static void main(String[] args) throws RunnerException {
        Options opt = new OptionsBuilder()
                .include(CollectionAdd.class.getSimpleName())
                .build();
        new Runner(opt).run();
    }
}

The full code is available on GitHub. As we can see, JMH uses annotations to parametrize the measurement:

@BenchmarkMode(Mode.AverageTime) specifies that the benchmark should measure the average time of the operation. An operation is a method that has the @Benchmark annotation, such as addToJavaArrayList or addToJavaLinkedList.
The @Measurement annotation specifies that the benchmark should run five times (see the Cnt column in the output below).
@Warmup specifies the number of warm-up iterations. These are thrown away and not counted in the results; they are just there to give the virtual machine a chance to optimize the code, as we discussed earlier.
@OutputTimeUnit(TimeUnit.MILLISECONDS) specifies that the output should be in milliseconds (see the Units in the output below).
@Fork(value = 1) specifies that the benchmark should run just once, in a single fork.
@State(Scope.Benchmark) marks the class that holds the state of the benchmark. In this case, the class has two member variables: an int collectionSize that specifies the collection size and an array valuesToAdd that holds the random numbers to add to the list.
We only want to generate the random numbers once, so we use the @Setup annotation to initialize the data for the benchmark. Note that we specify four different sizes for the list: 100'000, 200'000, 500'000, and 1'000'000 elements. The benchmark will run for each of these sizes, with JMH injecting the values into the collectionSize variable.

Note that the code adds primitive int values to collections of Integer objects, which forces Java to wrap these ints into objects (this technique is called autoboxing). While this adds a constant overhead to all the calls, it should keep the big picture the same.

We have two methods that are benchmarked, addToJavaArrayList and addToJavaLinkedList. When we run this program, a table with the results will be printed to the console:

java -jar target/java-collection-impls-benchmark.jar
# JMH version: 1.37
...
Benchmark                               (collectionSize)  Mode  Cnt   Score   Error  Units
CollectionAdd.addToJavaArrayList                  100000  avgt    5   0.298 ± 0.004  ms/op
CollectionAdd.addToJavaArrayList                  200000  avgt    5   0.606 ± 0.004  ms/op
CollectionAdd.addToJavaArrayList                  500000  avgt    5   4.360 ± 0.151  ms/op
CollectionAdd.addToJavaArrayList                 1000000  avgt    5  15.290 ± 1.407  ms/op
CollectionAdd.addToJavaLinkedList                 100000  avgt    5   0.342 ± 0.009  ms/op
CollectionAdd.addToJavaLinkedList                 200000  avgt    5   0.689 ± 0.015  ms/op
CollectionAdd.addToJavaLinkedList                 500000  avgt    5   1.712 ± 0.068  ms/op
CollectionAdd.addToJavaLinkedList                1000000  avgt    5   3.533 ± 0.282  ms/op

Comparing the “collectionSize” and the “Score”, we see that the LinkedList growth is linear in the number of elements. This is because adding an element to the end of a LinkedList always takes the same time, no matter how large it is. The picture is not as clear with the ArrayList, which internally has a fixed-size array that has to be copied into a larger array when it runs out of space (see this article for details between the two implementations).

Let us throw in some additional collections to make the comparison more interesting. Besides collections included with the Java standard library, third-party libraries such as the Eclipse Collections (previously known as the Goldman Sachs Collections due to their roots) exist. For example, the FastList (an optimized array-based list) and the IntArrayList (a specialized list that does not box/wrap int values) are optimized for performance. Adding these to the benchmark results in the following output (note that the source code snippet above does not include the additional benchmarks, but the full example in the GitHub repo does):

Benchmark                               (collectionSize)  Mode  Cnt   Score   Error  Units
CollectionAdd.addToEclipseFastList                100000  avgt    5   0.303 ± 0.011  ms/op
CollectionAdd.addToEclipseFastList                200000  avgt    5   0.620 ± 0.004  ms/op
CollectionAdd.addToEclipseFastList                500000  avgt    5   4.451 ± 0.215  ms/op
CollectionAdd.addToEclipseFastList               1000000  avgt    5  11.562 ± 0.283  ms/op
CollectionAdd.addToEclipseIntArrayList            100000  avgt    5   0.157 ± 0.005  ms/op
CollectionAdd.addToEclipseIntArrayList            200000  avgt    5   0.324 ± 0.006  ms/op
CollectionAdd.addToEclipseIntArrayList            500000  avgt    5   0.794 ± 0.038  ms/op
CollectionAdd.addToEclipseIntArrayList           1000000  avgt    5   1.516 ± 0.056  ms/op
CollectionAdd.addToJavaArrayList                  100000  avgt    5   0.298 ± 0.004  ms/op
CollectionAdd.addToJavaArrayList                  200000  avgt    5   0.606 ± 0.004  ms/op
CollectionAdd.addToJavaArrayList                  500000  avgt    5   4.360 ± 0.151  ms/op
CollectionAdd.addToJavaArrayList                 1000000  avgt    5  15.290 ± 1.407  ms/op
CollectionAdd.addToJavaLinkedList                 100000  avgt    5   0.342 ± 0.009  ms/op
CollectionAdd.addToJavaLinkedList                 200000  avgt    5   0.689 ± 0.015  ms/op
CollectionAdd.addToJavaLinkedList                 500000  avgt    5   1.712 ± 0.068  ms/op
CollectionAdd.addToJavaLinkedList                1000000  avgt    5   3.533 ± 0.282  ms/op

*Figure 1: The time it takes to add items to different collections grows linearly or polynomially with the size of the collection.*

We can see that they are indeed faster, so if performance is a concern and you have such large collections, you might consider using a FastList or IntArrayList instead of the standard JDK collections. You might ask, why would I ever use ArrayList if it's apparently slower than the others? Keep in mind that we only benchmarked adding elements to the end of the list. We did not measure the time it takes to access elements by index, remove elements, or iterate over the lists. Depending on your use case, ArrayList might still be the best choice and is a safe default. If you're curious, check out the source code on GitHub and run your own benchmarks. You could try even larger collections or smaller ones. Use other data structures than lists, etc.

Ok, enough with comparing performance. What about energy consumption?

JoularJX

While JMH is great for measuring the performance of Java applications, it does not provide a way to measure energy consumption. For this, we can use JoularJX. JoularJX is a Java agent that hooks into the JVM to measure the energy consumption of applications. Depending on the platform, RAPL (Running Average Power Limit) or other hardware counters are used to measure the energy consumption of the whole JVM. JoularJX can even break the consumption down to individual methods! This level of detail gives us interesting possibilities.

We will pick up our collection benchmarking later, but first, let’s start with a simple example. We will measure the energy consumption of calculating prime numbers. Not just a single prime number, that would run too quickly, we need something that runs for a while to get meaningful results. We will count the number of prime numbers between 2 and 100'000'000⁴:

package ch.ost.ifs.cal;

import java.util.stream.LongStream;

public class PrimeNumberCounter {
 
    public static void main(String[] args) {
        var from = 2;
        var to = 100_000_000;
        var numberOfPrimes = countPrimesInRange(from, to);
        System.out.println("There are " + numberOfPrimes + " primes between " + from + " and " + to); 
    }

    private static long countPrimesInRange(int from, int to) {
        var result = LongStream.range(from, to)
                               .filter(number -> isPrime(number))
                               .count();
        return result;
    }

    private static boolean isPrime(long number) {
        // the `factor * factor` is an optimization to reduce the number of checks
        for (long factor = 2; factor * factor <= number; factor++) {
            if (number % factor == 0) { 
                return false; 
            }
        }
        return true;
    }
}

The countPrimesInRange method calculates the number of prime numbers between from and to using an infinite stream of long numbers. The isPrime method tests whether a number is divisible by any number without a remainder (if you're unfamiliar with Java, that's what the % operator does). If it is possible to divide it, then the number is not prime, and the method returns false. If no divisor is found, the number is prime, and the method returns true. We can run this code with the JoularJX agent to measure the energy consumption:

java -javaagent:/opt/joularjx/joularjx-2.9.0.jar -classpath target/classes ch.ost.ifs.cal.PrimeNumberCounter
10/07/2024 09:51:04.244 - [INFO] - +---------------------------------+
10/07/2024 09:51:04.245 - [INFO] - | JoularJX Agent Version 2.9.0    |
10/07/2024 09:51:04.245 - [INFO] - +---------------------------------+
10/07/2024 09:51:04.254 - [INFO] - Results will be stored in joularjx-result/20510-1720597864252/
10/07/2024 09:51:04.262 - [INFO] - Initializing for platform: 'mac os x' running on architecture: 'aarch64'
10/07/2024 09:51:04.263 - [INFO] - Please wait while initializing JoularJX...
10/07/2024 09:51:06.765 - [INFO] - Initialization finished
10/07/2024 09:51:06.766 - [INFO] - Started monitoring application with ID 20510
There are 5761455 primes between 2 and 100000000
10/07/2024 09:51:49.329 - [INFO] - Thread CPU time negative, taking previous time + 0 : 40230480000 for thread: 1
10/07/2024 09:51:49.332 - [INFO] - JoularJX finished monitoring application with ID 20510
10/07/2024 09:51:49.332 - [INFO] - Program consumed 95.15 joules
10/07/2024 09:51:49.335 - [INFO] - Energy consumption of methods and filtered methods written to files

The second to last line shows that the “program consumed 95.15 joules”. JoularJX also writes the energy consumption of individual methods to files so that we can analyze them:

jdk.internal.org.objectweb.asm.Type.getArgumentTypes,0.0000
ch.ost.ifs.cal.PrimeNumberCounter.isPrime,91.4710
java.util.stream.LongPipeline$9$1.accept,0.0026
java.lang.ref.Reference.waitForReferencePendingList,0.0006
sun.invoke.util.BytecodeDescriptor.unparse,0.0000
java.util.stream.Streams$RangeLongSpliterator.forEachRemaining,1.4731
java.lang.invoke.MethodHandleNatives.resolve,0.0000
java.lang.ProcessHandleImpl.waitForProcessExit0,0.0004

The isPrime method consumed most of the energy (91.471 joules), which is unsurprising as it is the most computationally intensive part of the program. Our other methods, like countPrimesInRange, consumed so little power that they were not even listed.

Combining JMH and JoularJX

Now that we’ve seen how to measure the energy consumption of a single Java method, let’s combine JMH and JoularJX to measure our collection benchmark. There are some caveats: as we learned earlier, JMH warms up the JVM before measuring performance, so the energy consumption of the warm-up phase will be included in the JoularJX results. That’s why we turn off the warm-up, accepting that the performance measurements will be less accurate.

More importantly, we also need to change the measurement mode. Instead of having JMH call the methods multiple times and averaging the results, we will use Mode.SingleShotTime (indicated by the ss values in the "Mode" column of the table below), which measures the total time of a method. So now, we can combine the total time of a benchmark method with its energy consumption. For the same reason, we also have to run the benchmark individually for each collection size, so we run it once for 100'000 elements, combine it with the JoularJX measurements, then run it again for 200'000 elements, and so on.

The following snipped highlights the changes to the benchmark (notation: diff):

--- @BenchmarkMode(Mode.AverageTime)
+++ @BenchmarkMode(Mode.SingleShotTime)
--- @Measurement(iterations = 5)
--- @Warmup(iterations = 5)
+++ @Measurement(iterations = 10, batchSize = 1000)

So, we do ten iterations, and in each iteration, we run the same benchmark method 1000 times. Combining JMH and JoularJX, we get the following results for a collectionSize of 1'000'000 elements:

Benchmark                              (collectionSize) Mode Cnt    Score      Error   Units    Joules   
CollectionAdd.addToEclipseFastList       1000000         ss   10 10652.147  ±  312.464 ms/op 1479.9486
CollectionAdd.addToEclipseIntArrayList   1000000         ss   10  1600.708  ±   61.199 ms/op   52.6131
CollectionAdd.addToJavaArrayList         1000000         ss   10 13244.062  ±  261.798 ms/op 1879.2856
CollectionAdd.addToJavaLinkedList        1000000         ss   10  4098.937  ± 1140.329 ms/op  175.6389

Note the last column that shows the Joules as measured by JoularJX. Running the benchmark for the other collection sizes, we get the following results (only the ArrayList and LinkedList are shown below; see the repository for the full results):

*Figure 2: Comparing the score and the energy consumption of the* `ArrayList` *benchmark.*

*Figure 3: Comparing the score and the energy consumption of the* `LinkedList` *benchmark.*

As we can see, the scores and energy consumption evolve similarly with collection size. This relationship is not surprising since the energy consumption of a method is directly related to the time it takes to execute, especially for code like this that is not idle waiting for something to happen. But seeing them correlate is nice and gives me confidence in the tools and measurements.

Conclusion

This post looked at the energy consumption of Java software. We used the Java Microbenchmark Harness to measure the performance of Java software and JoularJX to measure energy consumption. We combined the two tools to measure the energy consumption of different Java collections.

We learned that:

JMH is a versatile benchmarking framework that allows developers to write and execute microbenchmarks to evaluate the performance of specific methods.

We also learned that:

JoularJX is a Java agent that hooks into the Java virtual machine to measure the energy consumption of applications.

Combining these two tools, we saw that:

The energy consumption of a method is directly related to the time it takes to execute.

Note that we will not be doing any statistical analysis of the data, as this is just a demonstration. If we were to do that, we would need more iterations and benchmarks to get meaningful results and set up the experiments properly.

Where can we go from here? We could study the energy consumption of different algorithms or compare different implementations of the same algorithm. But for such tasks, it’s probably easier to stick to measuring performance and using that as a proxy for energy efficiency. Suppose we are more concerned with understanding the energy consumption of different parts or components of our software than with direct comparisons. For example, one could determine the energy consumption of individual API operations. In that case, JoularJX is an excellent tool and starting point.

In a future post, we will try to measure the energy consumption of a Spring Boot application. Stay tuned!

*Figure 4: Overview of topics that we covered in this post.*

Acknowledgements. I thank Olaf and Martin for their valuable feedback.

Measuring Java Energy Consumption

Java Microbenchmark Harness

JoularJX

Combining JMH and JoularJX

Conclusion

Further Reading

Written by Mirko Stocker