Micro optimizations in Java. String.replaceAll

Dmytro Dumanskiy
Sep 3, 2020 · 6 min read
Image for post
Image for post
String.replaceAll

In this post, we will discuss the usage of another prevalent code constructions, the String.replaceAll and String.replace methods, and we will investigate how it affects the performance of your code in Java 11 and what you can do about it.

(Please consider all the code below from the point of performance)

(Please don’t focus on numbers, they are just metrics to prove the point)

String.replaceAll

value = value.contains(".") ? value.replaceAll("\\.","%2E") : value;

Do you see what is wrong here?

All this code is trying to do is encode the dot symbol and pass the result later to the HTTP URL. I was very lucky to find that particular code snippet in the popular codebase. It has a few things in a single line.

If you’re an experienced developer, you already know that the String.replaceAll method is using the regular expression pattern as the first parameter:

public String replaceAll(String regex, String replacement) {
return Pattern.compile(regex).matcher(this)
.replaceAll(replacement);
}

However, in the above code, we replace only dot character. Very often, when you perform the replacement operation, you don’t need any pattern matching. You can just use another very similar and lighter method in terms of the performance — the String.replace:

public String replace(CharSequence target, CharSequence replacement)

For example, when you need to replace a single word or a single character like in our example.

Note: In Java 8 the String.replace method was using the Pattern inside as the String.replaceAll. However, since Java 9, it has changed. And that gives us a large room for optimizations in the existing codebase.

Also, the usage of the methods String.replace and String.replaceAll seems like error-prone by design. That’s because when you start typing something in the IDE and you see both these methods closely, you might think that the String.replace replaces only the first occurrence while the replaceAll replaces all. And intuitively you will choose the String.replaceAll over the String.replace.

Another interesting thing here is that the code above already has a micro-optimization, it’s the value.contains(“.”) method usage. So the string is checked for the dot symbol to avoid pattern matching if there is nothing to replace.

Okay, let’s fix the example above:

value = value.replace(".", "%2E");

we can also try to apply the String.indexOf(“.”) optimization and check if that helps in the case of the String.replace method usage in Java 11:

value = value.contains(".") ? value.replace(".", "%2E") : value;

Let’s write the benchmark:

Results (lower score means faster) :

It looks like usage of the String.indexOf(“.”) with the String.replaceAll method, in fact, makes sense, even for the empty input string, compiling and matching the pattern takes too much time. ~50x difference is huge. The same applies to the input string without any dot symbols. And when the actual replacement work had to be performed, the String.replace method outperforms the String.replaceAll by three times.

Also, it seems like the String.indexOf optimization doesn’t make any sense with the String.replace method in Java 11 anymore, while it was required in Java 8 when we had a pattern matching inside. Now it even makes it a bit slower. That’s because of the String.replace method implementation, as it already performs String.indexOf search inside. So we’re doing the double job here.

“But you can precompile the regular pattern expression and use it in order to improve the performance of String.replaceAll method”, you would say. Agree. In fact, we do that a lot in Blynk.

Let’s check how precompiled pattern changes the numbers:

Results (lower score means faster) :

Yes, the numbers are better now. However, still not that good as in the case of the String.replace method. We can try to focus on optimizing the regular pattern even more, but there are already enough posts about it.

You might think that an initial example is just a single place, and it’s pretty rare. Let’s look into the GitHub:

Image for post
Image for post

GitHub just indexed some repositories, and on the first screen, five out of the six String.replaceAll usages could be replaced with String.replace! Yes, many projects are still on Java 8, which won’t make any difference for them. However, after most developers migrate to Java 11, we’ll have a lot of the slow legacy code out there. We can start improving it right away.

StringUtils.replace

An example of a custom replace method could be found, for instance, in the Spring Framework. Here it is.

Let’s look at another Spring code snippet:

String internalName = StringUtils.replace(className, ".", "/");

Do you see what is wrong here?

Let’s check Spring (latest source code), Apache Commons (latest version 3.11 of commons-lang3), and Java methods in our benchmark:

Results (lower score means faster) :

Hm, it looks like all methods are pretty close. Apache Commons is a bit slower, but that’s because it has additional logic for handling the case insensitive replacement. So everything makes sense.

And now, as we have similar performance, we don’t need a custom method or 3-d party library anymore in order to perform the fast String.replace in Java 11.

But something still not ok with this line:

return value.replace(".", "/");

Do you see what is wrong here?

Contrary to the first example where actual string replacement happens, here we have a single character both for the search and replace. And as we know, Java has a specialized version for character replacement:

String replace(char oldChar, char newChar)

Let’s add it to our benchmark as well:

@Benchmark
public String replaceChar() {
return value.replace('.', '/');
}

Apache Commons library also has StringUtils.replaceChars, but it uses String.replace(char, char) inside, so we’ll skip it. And we are one more step closer to eliminate this 3-d party library from your project.

Results (lower score means faster) :

Java character specialized version for the single char replacement four times faster than overloaded String.replace(String, String) and the custom Spring approach. The funny thing is that even in Java 8 String.replace(char, char) is optimized well enough. So Spring could safely use the String.replace(char, char).

String.remove

value = value.replace(".", "");

Do you see what is wrong here?

Unfortunately, Java still doesn’t have the String.remove method. As an alternative, we could use String.replace(char, char) method, but Java doesn’t have empty character literal as well, we can’t write code like that:

value.replace('.', '');

So, instead, we have to use a “hack” above.

Fortunately, there are many 3-d party implementations out there like Apache Commons StringUtils.remove(String, char). Spring, for example, uses own custom implementation for that based on own custom replace method:

public static String delete(String inString, String pattern) {
return StringUtils.replace(inString, pattern, "");
}

Let’s check Spring, Apache Commons, and Java methods in our benchmark again for the remove operation:

Results (lower score means faster) :

Specialized Apache Commons version wins by almost three times. The interesting thing here is that even specialized and optimized char removal is slower than char replacement in the String.replace(char, char). There is definitely still room for even further improvement.

Hopefully, someday we’ll see that in Java.

Conclusions

  • Use the String.replace over the String.replaceAll when possible
  • If you have to use the String.replaceAll, try to precompile the regular expression in the hot paths
  • Go with a specialized version of the String.replace(char, char) instead of the String.replace(String, String) when you can
  • For hot paths, you’ll still need to consider 3-d party libraries or custom methods instead of the String.replace(value, “”) code pattern

Here is a source code of benchmarks so that you can try it yourself.

Thank you for your attention, and stay tuned.

Previous post: Micro optimizations in Java. Good, nice and slow Enum

Javarevisited

Medium’s largest Java publication, followed by 9200+ programmers. Follow to join our community.

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store