Skip to content
Advertisement

Java regex Matcher.matches function does not match entire string

I am trying to match an entire string against a regex but the Matcher.match function returns true even when the entire string does not match.

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class Example {
    public static void main(String[] args) {
        final String string = ""query1" "query2" "query3"";
       // Unescaped Pattern: (+?".*?[^\]")(s+[aA][nN][dD]s++?".*?[^\]")* 
       final Pattern QPATTERN = Pattern.compile("(\+?".*?[^\\]")(\s+[aA][nN][dD]\s+\+?".*?[^\\]")*", Pattern.MULTILINE);
        Matcher matcher = QPATTERN.matcher(string);
      
        System.out.println(matcher.matches());
        matcher = QPATTERN.matcher(string);  
        while (matcher.find()) {
            System.out.println("Full match: " + matcher.group(0));
            
            for (int i = 1; i <= matcher.groupCount(); i++) {
                System.out.println("Group " + i + ": " + matcher.group(i));
            }
        }
    }
}

You can see from the while loop that the regex matches only parts of the string “query1” , “query2” and “query3” but not the whole string. Yet, matcher.matches() returns true.

Where am I going wrong?

I checked the pattern on https://regex101.com/ as well and the entire string is not matched.

Advertisement

Answer

matches() method returns true because it needs a full string match. You say you tested the regular expression on regex101.com, but you forgot to add anchors to simulate matches() behavior.

See regex proof that your regex matches the whole string.

If you want to stop matching the entire string with this expression, do not use .*?, this pattern can match really a lot.

Use

(?s)(+?"[^"\]*(?:\.[^"\]*)*")(s+[aA][nN][dD]s++?"[^"\]*(?:\.[^"\]*)*")*

Escaped version:

String regex = "(?s)(\+?"[^"\\]*(?:\\.[^"\\]*)*")(\s+[aA][nN][dD]\s+\+?"[^"\\]*(?:\\.[^"\\]*)*")*";

EXPLANATION

--------------------------------------------------------------------------------
  (?s)                     set flags for this block (with . matching
                           n) (case-sensitive) (with ^ and $
                           matching normally) (matching whitespace
                           and # normally)
--------------------------------------------------------------------------------
  (                        group and capture to 1:
--------------------------------------------------------------------------------
    +?                      '+' (optional (matching the most amount
                             possible))
--------------------------------------------------------------------------------
    "                       '"'
--------------------------------------------------------------------------------
    [^"\]*                 any character except: '"', '\' (0 or
                             more times (matching the most amount
                             possible))
--------------------------------------------------------------------------------
    (?:                      group, but do not capture (0 or more
                             times (matching the most amount
                             possible)):
--------------------------------------------------------------------------------
      \                       ''
--------------------------------------------------------------------------------
      .                        any character
--------------------------------------------------------------------------------
      [^"\]*                 any character except: '"', '\' (0 or
                               more times (matching the most amount
                               possible))
--------------------------------------------------------------------------------
    )*                       end of grouping
--------------------------------------------------------------------------------
    "                       '"'
--------------------------------------------------------------------------------
  )                        end of 1
--------------------------------------------------------------------------------
  (                        group and capture to 2 (0 or more times
                           (matching the most amount possible)):
--------------------------------------------------------------------------------
    s+                      whitespace (n, r, t, f, and " ") (1
                             or more times (matching the most amount
                             possible))
--------------------------------------------------------------------------------
    [aA]                     any character of: 'a', 'A'
--------------------------------------------------------------------------------
    [nN]                     any character of: 'n', 'N'
--------------------------------------------------------------------------------
    [dD]                     any character of: 'd', 'D'
--------------------------------------------------------------------------------
    s+                      whitespace (n, r, t, f, and " ") (1
                             or more times (matching the most amount
                             possible))
--------------------------------------------------------------------------------
    +?                      '+' (optional (matching the most amount
                             possible))
--------------------------------------------------------------------------------
    "                       '"'
--------------------------------------------------------------------------------
    [^"\]*                 any character except: '"', '\' (0 or
                             more times (matching the most amount
                             possible))
--------------------------------------------------------------------------------
    (?:                      group, but do not capture (0 or more
                             times (matching the most amount
                             possible)):
--------------------------------------------------------------------------------
      \                       ''
--------------------------------------------------------------------------------
      .                        any character
--------------------------------------------------------------------------------
      [^"\]*                 any character except: '"', '\' (0 or
                               more times (matching the most amount
                               possible))
--------------------------------------------------------------------------------
    )*                       end of grouping
--------------------------------------------------------------------------------
    "                       '"'
--------------------------------------------------------------------------------
  )*                       end of 2 (NOTE: because you are using a
                           quantifier on this capture, only the LAST
                           repetition of the captured pattern will be
                           stored in 2)
User contributions licensed under: CC BY-SA
1 People found this is helpful
Advertisement