Skip to content
Advertisement

Split a java string among brackets, including the brackets, but only if no space between brackets

I need to be able to turn a string, for instance "This and <those> are.", into a string array of the form ["This and ", "<those>", " are."]. I have been trying to using the String.split() command, and I’ve gotten this regex:

"(?=[<>])"

However, this just gets me ["This and ", "<those", "> are."]. I can’t figure out a good regex to get the brackets all on the same element, and I also can’t have spaces between those brackets. So for instance, "This and <hey there> are." Should be simply split to ["This and <hey there> are."]. Ideally I’d like to just rely solely on the split command for this operation. Can anyone point me in the right direction?

Advertisement

Answer

Not actually possible; given that the ‘separator’ needs to match 0 characters it needs to be all lookahead/lookbehind, and those require fixed-size lookups; you need to look ahead arbitrarily far into the string to know if a space is going to occur or not, thus, what you want? Impossible.

Just write a regexp that FINDS the construct you want, that’s a lot simpler. Simply Pattern.compile("<\w+>") (taking a select few liberties on what you intend a thing-in-brackets to look like. If truly it can be ANYTHING except spaces and the closing brace, "<[^ >]+>" is what you want).

Then, just loop through, finding as you go:

private static final Pattern TOKEN_FINDER = Pattern.compile("<\w+>");

List<String> parse(String in) {
  Matcher m = TOKEN_FINDER.matcher(in);
  if (!m.find()) return List.of(in);

  var out = new ArrayList<String>();
  int pos = 0;
  do {
    int s = m.start();
    if (s > pos) out.add(in.substring(pos, s));
    out.add(m.group());
    pos = m.end();
  } while (m.find());
  if (pos < in.length()) out.add(in.substring(pos));
  return out;
}

Let’s try it:

System.out.println(parse("This and <those> are."));
System.out.println(parse("This and <hey there> are."));
System.out.println(parse("<edgecase>2"));
System.out.println(parse("3<edgecase>"));

prints:

[This and , <those>,  are.]
[This and <hey there> are.]
[<edgecase>]
[<edgecase>, 2]
[3, <edgecase>]

seems like what you wanted.

User contributions licensed under: CC BY-SA
7 People found this is helpful
Advertisement