Actually the regex I have matches anything but the Chinese but it matches the numbers too, which I don’t want. As you can see in the regex demo here, the number 45 is matched but I need it to be excluded too.
https://regex101.com/r/XNtD12/1
Current regex is: (?!p{IsHan}n)[^p{IsHan}n?。,?!]+
Desired output:
He is 45 today <- matched 100% 你今天45岁了 <- not matched at all 这个句子没有数字 <- not matched at all Ok I see <- matched 100%
Java code being used:
String example = "He is 45 todayn你今天45岁了n这个句子没有数字nOk I see"; System.out.println(example.replaceAll("^[^\p{IsHan}\n?。,?!]+$", ""));
Advertisement
Answer
In your pattern you can omit the lookahead (?!p{IsHan}n)
as the directly following negated character class already does not match p{IsHan}
If you don’t want partial matches, you can add anchors to the start and the end of the pattern, and enable multiline using an inline modifier (?m)
String example = "He is 45 todayn你今天45岁了n这个句子没有数字nOk I see"; System.out.println(example.replaceAll("(?m)^[^\p{IsHan}\n?。,?!]+$", ""));
See a regex demo and a Java demo
If you want to remove optional trailing newlines using replaceAll:
^[^\p{IsHan}\n?。,?!]+$\R?