Java 11. How String.repeat was implemented and why?

Since Java 11. There is a new method in the String class called repeat. Yes, it repeats string, fast. Now guess. How many implementation lines are inside? One-liner? Stream-fancy one-liner? 5–7 lines? 9? 12? More. A lot more. In this article, I want to explore why. Are you ready?

Let’s start with a quick look at the method source code.

/**
* Returns a string whose value is the concatenation of this
* string repeated {
@code count} times.
* <p>
* If this string is empty or count is zero then the empty
* string is returned.
*
*
@param count number of times to repeat
*
*
@return A string composed of this string repeated
* {
@code count} times or the empty string if this
* string is empty or count is zero
*
*
@throws IllegalArgumentException if the {@code count} is
* negative.
*
*
@since 11
*/
public String repeat(int count) {
if (count < 0) {
throw new IllegalArgumentException("count is negative: " + count);
}
if (count == 1) {
return this;
}
final int len = value.length;
if (len == 0 || count == 0) {
return "";
}
if (len == 1) {
final byte[] single = new byte[count];
Arrays.fill(single, value[0]);
return new String(single, coder);
}
if (Integer.MAX_VALUE / count < len) {
throw new OutOfMemoryError("Repeating " + len + " bytes String " + count +
" times will produce a String exceeding maximum size.");
}
final int limit = len * count;
final byte[] multiple = new byte[limit];
System.arraycopy(value, 0, multiple, 0, len);
int copied = len;
for (; copied < limit - copied; copied <<= 1) {
System.arraycopy(multiple, 0, multiple, copied, copied);
}
System.arraycopy(multiple, 0, multiple, copied, limit - copied);
return new String(multiple, coder);
}

Input parameters validation

The things start simple. Input is validated and some corner cases are handled. We cannot provide negative input. In a case when string length is zero or repeating count is zero, we will receive an empty string. If the count is one, then we will receive the same reference (since String is immutable, this is perfectly safe). We can observe also the final keyword. In high volume usage, every hint for compiler counts.

if (count < 0) {
throw new IllegalArgumentException("count is negative: "+count);
}
if (count == 1) {
return this;
}
final int len = value.length;
if (len == 0 || count == 0) {
return "";
}

Single letter optimization

There is an interesting block when the input string length is exactly one letter.

if (len == 1) {
final byte[] single = new byte[count];
Arrays.fill(single, value[0]);
return new String(single, coder);
}

Authors decided to optimize for this particular case. If we want to repeat a single letter multiple times, then a fill method is used, which is very simple by the way. The fill method is very simple internally.

for (int i = 0, len = a.length; i < len; i++)
a[i] = val;

Finally, the byte array is converted to string:

new String(single, coder)

Ok, but what is the coder? Java is able to compact strings. In other words is able to minimize the size of the characters in memory. LATIN1 is used to store characters if possible, if some characters are outside of the support, then it fallbacks to UTF16. We can observe this logic in valueOf method:

/**
* Returns the string representation of the {
@code char}
* argument.
*
*
@param c a {@code char}.
*
@return a string of length {@code 1} containing
* as its single character the argument {
@code c}.
*/
public static String valueOf(char c) {
if (COMPACT_STRINGS && StringLatin1.canEncode(c)) {
return new String(StringLatin1.toBytes(c), LATIN1);
}
return new String(StringUTF16.toBytes(c), UTF16);
}

This value is always passed from the original String or calculated during the String creation.

Out of memory protection

There is quite interesting OOM protection with the following condition:

if (Integer.MAX_VALUE / count < len) {
throw new OutOfMemoryError("Repeating " + len + " bytes String " + count + " times will produce a String exceeding maximum size.");

The Integer.MAX_VALUE is 0x7fffffff, which is 2³¹-1. This expression can be greatly simplified if we assume that our initial string is exactly one character. It means that the length of the produced string cannot exceed 2³¹-1 characters. However, there is no guarantee that OOM won’t be thrown if we would like to create a string which is smaller (like 2³¹-2). This is just optimization for really out of scope numbers. The creation of such a String might take seconds and this approach will fail very fast, without consuming available memory.

The actual implementation of String.repeat

Now we can see how the actual copy operation is working.

    final int limit = len * count;
final byte[] multiple = new byte[limit];
System.arraycopy(value, 0, multiple, 0, len);
int copied = len;
for (; copied < limit - copied; copied <<= 1) {
System.arraycopy(multiple, 0, multiple, copied, copied);
}
System.arraycopy(multiple, 0, multiple, copied, limit - copied);
return new String(multiple, coder);

We can see that System.arraycopy is the main building block here, let’s have a quick look what is that.

/**
* Copies an array from the specified source array, beginning at the
* specified position, to the specified position of the destination array.
...
*
@param src the source array.
*
@param srcPos starting position in the source array.
*
@param dest the destination array.
*
@param destPos starting position in the destination data.
*
@param length the number of array elements to be copied.
...
*/
@HotSpotIntrinsicCandidate
public static native void arraycopy(Object src, int srcPos,
Object dest, int destPos,
int length);

Ok, so this is a high-speed native method which is copying data between arrays. We just need to specify positions and the length of the copy. So the code like this:

char[] a = "abcdefghi".toCharArray();
char[] b = "123456789".toCharArray();
System.arraycopy(a, 2, b, 4, 3);
System.out.println(a);
System.out.println(b);

will produce output like this:

abcdefghi
1234cde89

Back to our code. First, we are making space for the output String:

final int limit = len * count;
final byte[] multiple = new byte[limit];

Then, the original string is transferred to the new array (the first occurrence):

System.arraycopy(value, 0, multiple, 0, len);

Then, we have a loop which is making 2 tricks.

int copied = len;
for (; copied < limit - copied; copied <<= 1) {
System.arraycopy(multiple, 0, multiple, copied, copied);
}

Only the target array is used. This is base for the second trick. Each loop is copying twice more on each iteration. After the first loop, we have two occurrences, then four, eight, sixteen, etc.

1: copycopy............................
2:
copycopycopycopy....................
3:
copycopycopycopycopycopycopycopy....
4: ?

Each iteration will be faster in copying since the copy operation will be optimized for the bigger chunks. Of course, at some point, we cannot duplicate the data since there is not enough space in the array left. The final step is to fill the remaining space and construct the result:

System.arraycopy(multiple, 0, multiple, copied, limit - copied);
return new String(multiple, coder);

In the end, we have our repeated string.

3: copycopycopycopycopycopycopycopy....
4: copycopycopycopycopycopycopycopycopy