Skip to content

Regular expression with variable number of groups?

Is it possible to create a regular expression with a variable number of groups?

After running this for instance…

Pattern p = Pattern.compile("ab([cd])*ef");
Matcher m = p.matcher("abcddcef");
m.matches();

… I would like to have something like

  • m.group(1) = "c"
  • m.group(2) = "d"
  • m.group(3) = "d"
  • m.group(4) = "c".

(Background: I’m parsing some lines of data, and one of the “fields” is repeating. I would like to avoid a matcher.find loop for these fields.)


As pointed out by @Tim Pietzcker in the comments, perl6 and .NET have this feature.

Answer

According to the documentation, Java regular expressions can’t do this:

The captured input associated with a group is always the subsequence that the group most recently matched. If a group is evaluated a second time because of quantification then its previously-captured value, if any, will be retained if the second evaluation fails. Matching the string “aba” against the expression (a(b)?)+, for example, leaves group two set to “b”. All captured input is discarded at the beginning of each match.

(emphasis added)