Performance Evaluation of JAVA ArrayList

M Jayasinghe, V Salaka, I Perera and S Perera

image credit

The List interface in JAVA extends Collection and declares the behavior an ordered collection (also known as a sequence). ArrayList is the Resizable-array implementation of the List interface.

An ArrayList has an initial capacity which is simply the size of the array used to store the elements in the list. When you create an ArrayList you can specify the initial capacity. For example:

ArrayList<Integer> arrayList = new ArrayList<>(100);

In this case, the initial capacity of the ArrayList will be 100. As you add elements to an ArrayList, its capacity grows automatically. The initial capacity does not change the logical size of an ArrayList rather it reduces the amount of incremental reallocation.

If we do not specify an initial capacity then an ArrayList object will be created containing an initial array of size ten.

ArrayList<Integer> arrayList = new ArrayList<>();

It is also possible to increase the capacity of an ArrayList instance before adding a large number of elements using the ensureCapacity operation to ensure that it can hold at least the number of elements specified by the minimum capacity argument.

In this article we will investigate the performance of ArrayList add operation. Our main objective investigate the impact of the array list performance on the following:

Initial Capacity (initial size): This is the capacity we specify when we create the array (see above). From now onwards we will call this initial size.

Final Size: This represents the logical size of the ArrayList, i.e. actual number of elements in it.

Performance Evaluation

In this article we will benchmark the performance of JAVA ArrayList add operation using JMH (Java Microbenchmark Hardness). The two main performance metrics considered are

  1. Average latency (measured in nanoseconds)
  2. Throughput (measured in number of operations per millisecond)

The scenario implemented sequentially adds elements to an ArrayList using the add operation. We run the benchmark by varying the following parameters:

  1. Initial capacity (size) of the ArrayList
  2. Final size of the ArrayList (the logical size of the array list)

The following are the parameters specified for JMH. Note that these parameters are common to all runs.

  1. Number of warmup iterations: 10
  2. Numbers of iterations: 10
  3. Number of forks: 2
  4. Number of threads: 1
  5. Time unit ns (nanoseconds)
  6. Mode: benchmark mode

The performance test was run on a 4-core machine with 8 GB RAM and JDK used was Oracle 1.8_131

Performance Results

We ran our benchmark for a number of scenarios. The following table shows the results.

Let us now plot the behaviour for the case where the final size (logical size) is 999. The following graph shows the way in which the average latency varies with the (increasing) initial size (initial capacity).

The following graph shows how the throughput varies with the initial size (initial capacity).

Discussion

Initial size (initial capacity) > Final size (Logical Size)

If the initial capacity is greater than the final size there can be a significant degradation in performance.

For example, when we create an array with an initial capacity of 1000 and use only the first 10 elements, we observed an average latency of 345 ns. On the other hand, when we create an array with an initial capacity of 10 and used the first 10 elements, then we get an average latency of 80 ns.

The main reason for the degradation in performance is the cost of initializing the large array itself. When we profile our benchmark code we noticed array initialization taking a significant amount of processing time.

Initial Size (initial capacity) < Final Size (logical size)

If the initial size is smaller than the final size then there can also be a degradation in performance.

For example, when we create an array with an initial capacity of 10 and use the first 1000 elements, then we observed an average latency of 5797 ns. On the other hand, when we create an array with an initial capacity of 1000 and use 1000 elements, then we get an average latency of 4145 ns.

The main reason behind the degradation in performance is the additional processing needed for copying existing elements into a new array with a large size. When we profile benchmark code, we noticed array copy operation taking a significant amount of processing time.

Over Specifying vs. Under Specifying Initial Size (initial Capacity)

We note that the degradation in performance, as a result of over specifying the initial size, is less compared to that of under specifying. Let’s now discuss this behavior in a bit more detail. Let’s assume that the final size of the array list is 1000. The following figure plots the variation in the throughput with increasing ArrayList size.

We note that the maximum throughput is achieved when the initial array list size is 1000 (i.e. when initial capacity= final size). Let us now assume that TPS_1 and TPS_2 represent the throughput when the initial array size is 500 and 1500 respectively (refer to the figure above). We note that TPS_2 is significantly higher than TPS_1, which validates our claim regarding the performance differences in over specifying vs. under specifying the initial size.

The main reason why the performance degrades more when initial capacity < final size (as opposed to initial capacity > final size) is the additional cost involved in copying of elements into a larger array.

Conclusion

In this article, we have done a performance analysis of the Java ArrayList add operation.

It is clear from the from the results (considering the add operation of ArrayList) that if the required maximum capacity of the ArrayList is known, we can get the optimal performance (both average latency and throughput) by specifying the initial capacity to final size final size (or using ensureCapacity as required)

In doing so, we can get 24% to 34% improvement in average latency and 30% to 50% improvement in throughput. In our future work, we hope to consider benchmarking other operations such as insert, remove, and get in addition to the add operation. In our current tests, although we vary the final size between different tests, for a given test scenario final size is fixed. We hope to extend our work to cater for scenarios where the final size varies. In such cases, the final size can come from probability distribution, which represents (actual) access pattern of users.