Skip to content
Advertisement

Canonical equivalence in Pattern

I am referring to the test harness listed here http://docs.oracle.com/javase/tutorial/essential/regex/test_harness.html

The only change I made to the class is that the pattern is created as below:

Pattern pattern = 
        Pattern.compile(console.readLine("%nEnter your regex(Pattern.CANON_EQ set): "),Pattern.CANON_EQ);

As the tutorial at http://docs.oracle.com/javase/tutorial/essential/regex/pattern.html suggests I put in the pattern or regex as au030A and string to match as u00E5 but it ends on a No Match Found. I saw both the strings are a small case ‘a’ with a ring on top.

Have I not understood the use case correctly?

Advertisement

Answer

The behavior you’re seeing has nothing to do with the Pattern.CANON_EQ flag.

Input read from the console is not the same as a Java string literal. When the user (presumably you, testing out this flag) types u00E5 into the console, the resultant string read by console.readLine is equivalent to "\u00E5", not “å”. See for yourself: http://ideone.com/lF7D1

As for Pattern.CANON_EQ, it behaves exactly as described:

Pattern withCE = Pattern.compile("^au030A$",Pattern.CANON_EQ);
Pattern withoutCE = Pattern.compile("^au030A$");
String input = "u00E5";

System.out.println("Matches with canon eq: "
    + withCE.matcher(input).matches()); // true
System.out.println("Matches without canon eq: "
    + withoutCE.matcher(input).matches()); // false

http://ideone.com/nEV1V

User contributions licensed under: CC BY-SA
4 People found this is helpful
Advertisement