Wix Engineering
Published in

Wix Engineering

Benchmarking String.regionMatches

Benchmarking String.regionMatches vs startsWith/endsWith/equals
Cover image by jarmoluk from Pixabay.

Disclaimer: Yet another post about performance and microbenchmarks. Beware of the results.

There is a method in String class, called regionMatches. Basically, it tests if the string has another string as a substring at the specific place:

"one_two_three".regionMatches(4, "two", 0, 3) // true

By some reason I wondered about its performance comparing to other methods: startsWith and endsWith. And just for fun adding another comparison with the functional equivalent:

"one_two_three".substring(4, 3).equals("two") // true

Obviously, the regionMatches should outperform the latter, just because of the substring (which has extra memory allocation). I was actually shocked by the results:

regionMatches             avgt    3  26.565 ±   1.306  ns/op
substringEquals avgt 3 15.663 ± 11.616 ns/op

WHAT?! This can’t be right! However, no matter how many times or how long I ran this benchmark, results were more or less the same: substring+equals outperformed regionMatches… Then I wrote a little bit more comprehensive test (however, it has only one input data — a string of 51 characters).

Benchmarks

Ok, now we need to go deeper, so let me show all the different benchmarks I did.

Test data: a search string uri of 51 characters, needles at the beginning, in the middle and at the end of the string.

val uri = "/site/section/blog/2022-04-17/the-title-of-the-post"
val begin = "/site/section/blog/2"
val middle = "/blog/2022-04-17/the"
val end = "he-title-of-the-post"

Testing at the beginning of the string:

@Benchmark
def begin_startsWith(): Boolean =
uri.startsWith(begin)
@Benchmark
def begin_regionMatches(): Boolean =
uri.regionMatches(0, begin, 0, begin.length)
@Benchmark
def begin_substringEquals(): Boolean =
uri.substring(0, begin.length) == begin

At the end:

@Benchmark
def end_endsWith(): Boolean =
uri.endsWith(end)
@Benchmark
def end_regionMatches(): Boolean =
uri.regionMatches(endIndex, end, 0, end.length)
@Benchmark
def end_substringEquals(): Boolean =
uri.substring(endIndex) == end

In the middle I did 2 tests, one to find exact substring (val middle = “/blog/2022–04–17/the”), another is to find part of the substring (val middle2Sides = s”-$middle-”). The idea of the second case is to make 2 substrings, not one.

@Benchmark
def middle_regionMatches(): Boolean =
uri.regionMatches(middleIndex, middle, 0, middle.length)
@Benchmark
def middle_substringEquals(): Boolean =
uri.substring(middleIndex, middleIndex + middle.length) == middle
@Benchmark
def middle2Sides_regionMatches(): Boolean =
uri.regionMatches(middleIndex, middle2Sides, 1,
middle2Sides.length - 2)
@Benchmark
def middle2Sides_substringEquals(): Boolean =
uri.substring(middleIndex, middleIndex + middle.length) ==
middle2Sides.substring(1, middle2Sides.length - 1)

JVM

After a while I decided to run benchmarks against different JVMs, because I couldn’t explain what’s happening there (as I showed in the beginning of the post, substring+equals won!). And I ran all these benchmarks on 3 JVMs: openjdk-8, openjdk-11 and openjdk-17.

Benchmarks for all JVMs

Interesting, right? Looks like JDK8 is faster or the same for all cases, and then performance deteriorating, and in openjdk-17 regionMatches significantly slower than substring+equals. I don’t know how to explain it apart from some internal JVM optimizations for equals method that aren’t applied for regionMatches.

I did a test for smaller needle, and it shows that there is a some sweet spot (length), when performance of regionMatches becomes better than substring+equals, but it’s in low single digits (on my computer).

My guess, is that this loop in regionMatches isn’t optimized:

while (len-- > 0) {
if (tv[toffset++] != ov[ooffset++]) {
return false;
}
}

equals method uses internal method StringLatin1.equals, which also does the loop, but maybe length check before removes bound checks:

@IntrinsicCandidate
public static boolean equals(byte[] value, byte[] other) {
if (value.length == other.length) {
for (int i = 0; i < value.length; i++) {
if (value[i] != other[i]) {
return false;
}
}
return true;
}
return false;
}

Conclusion

It seems like a bug in JVM, regionMatches isn’t supposed to behave this way. And sometimes it’s worth benchmarking something basic: our assumptions aren’t always correct. Also, I’d like to mention a couple of blog posts about benchmarking around this area: one and two — sadly, all were done for JDK8 and didn’t reveal anything strange.

Play with charts here. Source code is on GitHub.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Dmitry Komanov

Dmitry Komanov

Software developer, moved to Israel from Russia, trying to be aware of things.