Skip to content
Advertisement

Merging 2 regex that allow only English and Arabic characters

I have a string and I want to remove any other character such as (0..9!@#$%^&*()_., …) and keep only alphabetic characters.

After looking up and doing some tests, I got 2 regexes formats:

String str = "123hello!#$% مرحبا. ok";
str = str.replaceAll("[^a-zA-Z]", "");
str = str.replaceAll("\P{InArabic}+", "");
System.out.println(str);

This should return “hello مرحبا ok”.

But of course, this will return an empty string because we’re removing any non-Latin characters in the first regex then we remove any non-Arabic characters in the second regex.

My question is, how can I merge these 2 regexes in one to keep only Arabic and English characters only.

Advertisement

Answer

Use lowercase p since negation is handled with ^ and no quantifier is needed (but wouldn’t hurt) since using replaceAll:

String str = "123hello!#$% مرحبا. ok";
str = str.replaceAll("[^a-zA-Z \p{InArabic}]", "");
System.out.println(str);

Prints:

hello مرحبا ok

Note based on your expected results you want spaces included so a space is in the character list.

User contributions licensed under: CC BY-SA
3 People found this is helpful
Advertisement