How a Bug In Java String API Messed Up Our Data

Erim Tali
Trendyol Tech
Published in
3 min readAug 3, 2022
source

Couple days ago, our team encountered a strange behavior in one of our apps. It seems like some turkish characters of a json string were converted meaningless different chars by app. Our stored json objects which has turkish chars in different fields seems to change after a process. For example ‘Ş’ turned into ‘^’ or ‘ş’ turned into ‘_’ or ‘İ’ turned into ‘0’.

This was a big incident for us, some of documents were corrupted and we had not any idea why this is happening. So, we could not reproduce same case in local environment at first and my teammates thought this was a docker image situation, cause code seems to work fine. To take action fast a collegue of mine changed our JDK docker image and it worked. The problem resolved but why we still had no idea what is going on.

We had to dive in, my teammate Cagatay and I got paired and started work on how the case happened, our first goal was create to case in our local env. I set my local computer with java version same as the docker version which we thought the buggy one.

It was adoptopenjdk/openjdk11-openj9:jre-11.0.4_11_openj9-0.15.1-alpine

Still the code seemed work fine, turkish chars were not converting differents ones by themselves. As I mentioned earlier case is all about the json objects, when code tries to deserialize a json string to a java pojo we were losing some turkish chars. We use gson library for our deserialization process and we thought this might be the reason and started to debug. Regardless gson also was working fine however there were something around java String library, somehow our characters were converted different chars.

Cagatay mentioned that maybe the enviroment variables causes this issue we need to check them, then we finally found a change in our side, our app had a new env variable which does not have before “-XX:+CompactStrings”
Thats why we never faced these kind of an issue before.

In this particular image with this particular parameter we were able to reproduce the error in local env, i finally saw ‘^’ instead of ‘ş’ in my computer screen.
After digging more i found the bug in the String class, there was a for loop implementation mistake. As showing below in the ‘compressible’ function if `start` is greater then `length` loop will not be executed and function returns true for non-compressible characters which `ş` is not. That’s why some of our turkish characters had been changed.

static boolean compressible(char[] c, int start, int length) {
for (int i = start; i < length; ++i) {
if (c[i] > 255) {
return false;
}
}

return true;
}

After a search in google i figured it out openj9 maintainers already knew the issue and had fix for it, that’s why updating our docker image solved our case.

It was an incident that could only happen to us in software world and it happened i guess.

--

--