Skip to content
Advertisement

Pattern-Matcher is not looking for all matches for some reason

I have, it seems, a very easy task, with which Matcher for some reason does not cope.

So, the task sounds like this: You need to find and display all the substrings from the string “AAABBBB”, where there are pairs of AA or BB, while they can go in a row or alternate “AABBBBBBAA”. Either go separately “AAAA” or “BBBB”.

To solve this problem, I decided to use Pattern-Matcher, which should search for all matches in a string. For this task, I used the pattern: “(AA|BB)+”, which in my understanding should look for all substrings in which there is at least one pair of AA or BB. The following code is in the picture:

code pic.

public class Main {
    public static void main(String[] args) {
        String s = "aaabbbb";
        Matcher matcher = Pattern.compile("(aa|bb)+").matcher(s);
        while(matcher.find())
            System.out.println(matcher.group());
    }
}

Everything seems to be correct and all the appropriate lines should appear, but I see this answer:

output pic.

Output:
    aa
    bbbb

It seems to me that the answer should be completely different, because we have the string “AAABBBB”. Let’s start with the first character of the string is A, the next is also A, and together they form a pair. Let’s start now with the second character – it is also A, as well as the 3 character is also A – this is the second suitable pair. Now let’s look at the entire substring of 2 characters, which A – this will be the string AABBBB and it also fits our condition, like all other strings AABB, BB and BB.

Why is this happening?

UPD: I also decided to add why I decided that my theory is correct. Here is the following code, true is output everywhere

public class Main {
    public static void main(String[] args) {
        System.out.println("AA".matches("(AA|BB)+"));
        System.out.println("AABBBB".matches("(AA|BB)+"));
        System.out.println("AABB".matches("(AA|BB)+"));
        System.out.println("BBBB".matches("(AA|BB)+"));
        System.out.println("BB".matches("(AA|BB)+"));
    }
}

Advertisement

Answer

The javadoc for Matcher.find() says:

This method starts at the beginning of this matcher’s region, or, if a previous invocation of the method was successful and the matcher has not since been reset, at the first character not matched by the previous match.

If the match succeeds then more information can be obtained via the start, end, and group methods, and subsequent invocations of the find() method will start at the first character not matched by this match.

So in your main example, the first find() call matches AA. Then the second find() starts at the 3rd character.


To get the matching to work like you wanted, you would need to write something like this:

    String s = "aaabbbb";
    Matcher matcher = Pattern.compile("(aa|bb)+").matcher(s);
    int pos = 0;
    while (matcher.find(pos)) {
        System.out.println(matcher.group());
        pos = matcher.start() + 1;
    }

But note that even this won’t find all possible matches if the pattern has quantifiers (for example). For that, the solution would probably be to write a special purpose regex engine.

User contributions licensed under: CC BY-SA
4 People found this is helpful
Advertisement