Custom String Comparison in Java
A hidden gem in the standard Java library is the RuleBasedCollator class. It allows you to define custom collation — a set of rules for comparison between different characters in a string — in a flexible way. Your use cases might be one of the following:
- you need to sort strings with uppercase characters with higher or lower priority than lowercase characters;
- you want to set higher or lower precedence for characters with an accent (like à) or any other non-Latin characters;
- you need to specify any other non-standard rules for character precedence that would be too complicated to define in a
Comparator
implementation.
The RuleBasedCollator
class allows you to set the rules in a convenient and flexible manner, just by writing them as a string expression. For instance, if you sort the following list in a usual order, you get the expected result (I’m using the Stream.toList()
method from Java 16 here — in earlier versions of Java you can just use .collect(Collectors.toList())
):
If you now want letter b
to rank higher than letter a
but keep all the other comparison rules the same, you could write such rule as follows: < b < a
. Here it is in practice:
There are more complicated examples of usage of the RuleBasedCollator
class in the JavaDocs— check them out to know more.