Skip to content
Advertisement

Complex splitting of a String using REGEX, only discarding spaces

In Java (JDK 11), consider the following string:

String hello = "333+444 5qwerty5 006 -7";

I am trying to come up with a RegEx that will split anything that isn’t a digit, whilst keeping the separators except space. So in the above example, I would like to end up with the following array:

["333" , "+" , "444" , "5" , "q" , "w" , "e" , "r" , "t" , "y" , "5" , "006" , "-7"]

Do note the leading zeroes in 006, and -7. The code I am using is the following:

String[] splited = s.split("((?<=[^0-9]+)|(?=[^0-9]+)|(\s+))");

However, I can see that my array is keeping spaces. I can’t for the life of me figure my mistake. Any thoughts?

EDIT: Turns out the requirement kept getting more complicated. Eventually I had to obtain the following collection, based on the sample input from above:

["333+444" , "5" , "q" , "w" , "e" , "r" , "t" , "y" , "5" , "006" , "-7"]

So if there is no space between an integer and operators + - * / % ^, then do not split them. I have issues implementing this rule along with the fact that leading zeroes and negative numbers should not be split.

Based on that, it turns out that it is much simple to work with The fourth bird‘s sample where matcher() is used instead of split(). The RegEx syntax is simpler to understand, troubleshoot and build upon.

Perhaps I could have asked another question to cater for the additional complexity, but I do not think it is right to use StackOverflow to keep asking very similar questions because one got stuck.

Advertisement

Answer

Instead of using split, you could also match all the parts:

-?d+|S

The pattern matches:

  • -? Optionally match a hyphen
  • d+ Match 1+ digits
  • | Or
  • S Match a single non whitespace char

See a regex demo and a Java demo.

Example

String regex = "-?\d+|\S";
String string = "333+444 5qwerty5 006 -7";

List<String> allMatches = new ArrayList<String>();

Matcher m = Pattern.compile(regex).matcher(string);
while (m.find()) {
    allMatches.add(m.group());
}

System.out.println(Arrays.toString(allMatches.toArray()));

Output

[333, +, 444, 5, q, w, e, r, t, y, 5, 006, -7]
User contributions licensed under: CC BY-SA
2 People found this is helpful
Advertisement