In C# it appears that Grüsse
and Grüße
are considered equal in most circumstances as is explained by this nice webpage. I’m trying to find a similar behavior in Java – obviously not in java.lang.String
.
I thought I was in luck with java.regex.Pattern
in combination with Pattern.UNICODE_CASE
. The Javadoc says:
UNICODE_CASE enables Unicode-aware case folding. When this flag is specified then case-insensitive matching, when enabled by the CASE_INSENSITIVE flag, is done in a manner consistent with the Unicode Standard.
Yet the following code:
Pattern p = Pattern.compile(Pattern.quote("Grüsse"), Pattern.UNICODE_CASE | Pattern.CASE_INSENSITIVE); System.out.println(p.matcher("Grüße").matches());
yields false
. Why? And is there an alternative way of reproducing the C# case folding behavior?
—- edit —-
As @VGR pointed out, String.toUpperCase
will convert ß
to ss
, which may or may not be case folding (maybe I’m confusing concepts here). However other characters in the German locale are not “folded”, for instance ü
does not become UE
. So to make my initial example more complete, is there a way to make Grüße
and Gruesse
compare equal in Java?
I was thinking the java.text.Normalizer
class could be used to do just that, but it converts ü
to u?
rather than ue
. It also hasn’t an option to provide a Locale
, which confuses me even more.
Advertisement
Answer
For reference, the following facts:
Character.toUpperCase()
cannot do case folding, as one character must map to one character.String.toUpperCase()
will do case folding.String.equalsIgnoreCase()
usesCharacter.toUpperCase()
internally, so doesn’t do case folding.
Conclusion (as @VGR pointed out): if you need case insensitive matching with case folding, you need to do:
foo.toUpperCase().equals(bar.toUpperCase())
and not:
foo.equalsIgnoreCase(bar)
As for the ü
and ue
equality, I’ve managed to do it with a RuleBasedCollator
and my own rules (one would expect Locale.German
had that built-in but alas). It looked really silly/over-engineered, and since I needed only the equality, not the sorting/collating, in the end I’ve settled for a simple set of String.replace
prior to comparison. It sucks but it works and is transparent/readable.