How to sort Umlaute in Java correctly?

Jens Goldhammer
fme DevOps Stories
Published in
3 min readNov 6, 2018

Have you ever had to sort Strings in Java, e.g. to show documents and files by name in a lexigraphical order in the user interface to the user? I had a similar requirement in a current project.

Imagine, we have following list of strings:

List<String> names = Lists.newArrayList( "1 Introduction", "Adele", "1.2 Sorting in Java", "1. unbelievable point", "andere", "anders", "zone", "zippy", "Ändern", "1.1 motiviation", "ängstlich");

String-based sorting

The first step was to implement a sort based on the collections class with method sort. I think, everybody knows that…

@Test
public void stringSorting() {
Collections.sort(names);
}

The result is the following:

1 Introduction
1. unbelievable point
1.1 motiviation
1.2 Sorting in Java
Adele
zone
andere
anders
zippy
Ändern
ängstlich

The problem is that the String class does not know language characteristics.

advantages / disadvantages

  • number sorting is correct
  • german Umlaute are not recognized correctly
  • no case sensitive sorting

Collator sorting

A colleague told me about the collator class in Java2 which should recognise language characteristics. It implements the comparator interface, so you can use it very easily. So I tried it with the german based collator:

@Test
public void collatorSorting() {
Collections.sort(names,Collator.getInstance(Locale.GERMAN));
}

The result is the following:

1.1 motiviation
1.2 Sorting in Java
1. unbelievable point
1 Introduction
Adele
andere
Ändern
anders
ängstlich
zippy
zone

advantages / disadvantages:

  • German Umlaute are recognized
  • case insensitive sorting
  • number sorting is not the way you would expect

Rule based Collator sorting

Ok, the simple collator sorting was just ok except of the numbers. So I searched in the documentation and found out that there is a rule based collator implementation in Java which can be extended with your own sorting rules! How sweet is that? I only had to make the collator clear that a space is smaller than a point, so that “1 Introduction” will be sorted before “1.1 Motivation”… I just played around for half an hour and the result is that:

// my own rules!
private static final String EXT_RULES = "< ' ' < '.'"+
"<0<1<2<3<4<5<6<7<8<9<a,A<b,B<c,C<d,D<ð,Ð<e,E<f,F<g,G<h,H<i,I<j"+
",J<k,K<l,L<m,M<n,N<o,O<p,P<q,Q<r,R<s, S & SS,ß<t,T& TH, Þ &TH,"+
"þ <u,U<v,V<w,W<x,X<y,Y<z,Z&AE,Æ&AE,æ&OE,Œ&OE,œ";

@Test
public void extendedCollatorSorting() throws ParseException {
RuleBasedCollator germanCollator = (RuleBasedCollator)
Collator.getInstance(Locale.GERMAN);

RuleBasedCollator extGermanCollator = new RuleBasedCollator(
germanCollator.getRules() + EXT_RULES);

Collections.sort(names, extGermanCollator);
}

So you have to the create a normal german collator, take the complete german normal rules (germanCollator.getRules()) and combine the rules with our own rules (EXT_RULES ) into a new collator which can be used to sort the strings.

The result is the following:

1 Introduction
1. unbelievable point
1.1 motiviation
1.2 Sorting in Java
Adele
andere
Ändern
anders
ängstlich
zippy
zone

advantages / disadvantages

  • Umlaute are recognized
  • case insensitive sorting
  • number sorting is the way you would expect

Additional hints

  • The collator class is NOT thread-safe. You should create a factory which creates a new instance (by cloning the colator) for every access when you want to use the collator in different threads.
  • There are builtin collator rules for various languages, so you can explore them or adapt them with the custom rules to your need.

Summary

Sorting in Strings can be done in different ways, but I highly recommend to use the collator class (with own rules), e.g. if the customer wants sorting like in windows explorer.

References:

  1. http://docs.oracle.com/javase/8/docs/api/java/util/Collections.html#sort-java.util.List-
  2. https://docs.oracle.com/javase/8/docs/api/java/text/Collator.html
  3. https://docs.oracle.com/javase/8/docs/api/java/text/RuleBasedCollator.html

--

--

Jens Goldhammer
fme DevOps Stories

Software Engineer with focus on Cloud, Java and Typescript — working for fme AG — dad of 2 little boys and one sweet girl — loving new technologies