Skip to content
Advertisement

How to make a Java containsignorecase that works with all human languages

For example I have this simple containsignorecase method:

public static boolean containsIgnoreCase(String a, String b) {
    if (a == null || b == null) {
        return false;
    }
    return a.toLowerCase().contains(b.toLowerCase());
}

But it fails with some comparissions like: ΙΧΘΥΣ & ιχθυσ

So I switched to this library which is mentioned here:

import org.apache.commons.lang3.StringUtils;

which has its own method StringUtils.containsIgnoreCase:

public static boolean containsIgnoreCase2(String a, String b) {
    if (a == null || b == null) {
        return false;
    }

    return StringUtils.containsIgnoreCase(a, b);
}

Now it works for ΙΧΘΥΣ & ιχθυσ, but it fails for weiß & WEISS, tschüß & TSCHÜSS, ᾲ στο διάολο & Ὰͅ Στο Διάολο, flour and water & FLOUR AND WATER.

So I wonder if it is possible to create something that will work for all languages or am I missing something to configure on the apache library?

I also saw that this library icu4j could be used but could not find an example

<dependency>
    <groupId>com.ibm.icu</groupId>
    <artifactId>icu4j</artifactId>
    <version>72.1</version>
</dependency>

Any help or recommendation is appreciated 🙂

Advertisement

Answer

toLowerCase() and toUpperCase() are not always symmetric. Your examples work if you uppercase them instead:

public static boolean containsIgnoreCase(String a, String b) {
    if (a == null || b == null) {
        return false;
    }
    return a.toUpperCase().contains(b.toUpperCase());
}
User contributions licensed under: CC BY-SA
4 People found this is helpful
Advertisement