Skip to content
Advertisement

Java Regex to match Chinese and/or ordinary numbers

Actually the regex I have matches anything but the Chinese but it matches the numbers too, which I don’t want. As you can see in the regex demo here, the number 45 is matched but I need it to be excluded too.

https://regex101.com/r/XNtD12/1

Current regex is: (?!p{IsHan}n)[^p{IsHan}n?。,?!]+

Desired output:

He is 45 today <- matched 100%
你今天45岁了 <- not matched at all
这个句子没有数字 <- not matched at all
Ok I see <- matched 100%

Java code being used:

String example = "He is 45 todayn你今天45岁了n这个句子没有数字nOk I see";
System.out.println(example.replaceAll("^[^\p{IsHan}\n?。,?!]+$", ""));

Advertisement

Answer

In your pattern you can omit the lookahead (?!p{IsHan}n) as the directly following negated character class already does not match p{IsHan}

If you don’t want partial matches, you can add anchors to the start and the end of the pattern, and enable multiline using an inline modifier (?m)

String example = "He is 45 todayn你今天45岁了n这个句子没有数字nOk I see";
System.out.println(example.replaceAll("(?m)^[^\p{IsHan}\n?。,?!]+$", ""));

See a regex demo and a Java demo

If you want to remove optional trailing newlines using replaceAll:

^[^\p{IsHan}\n?。,?!]+$\R?
User contributions licensed under: CC BY-SA
1 People found this is helpful
Advertisement