Regex to match any number unless it is part of a specific string

Tags: ,



Sorry if this is a dupe, I did search but couldn’t seem to find something that matched my query.

I have a replacer function in java that runs multiple regexes to find and replace specific strings.

One of them is looking at numbers, and if it finds a number it adds space around it, for example;

test123 > test 123

regex used is “(([0-9]+)” and it replaces it with ” $1 “

I have hit an issue now though that in a few edge cases I need to not split the number from a specific string, like hash names for example. So I now need to update my regex to wrap any combination of numbers with spaces, UNLESS it matches a specific sequence.

For example, I want the following results;

  • test123 > test 123
  • 84test > 84 test
  • test md5 > test md5
  • sha256 > sha256
  • word two sha1 > word two sha1
  • w0rd > w 0 rd
  • aisha256 > aisha 256
  • word md 5 > word md 5 etc

I’ve tried using negative lookbehind to match the words like md5, sha1, sha256, etc but it still seems to split the numbers. I’m sure its something simple I am doing wrong…. “((?!md5)(d+))”

So basic rules are, any digit found in the string should be surrounded by spaces UNLESS it is preceeded by the word sha or md. If there is whitespace already between the number and md or sha, the whitespace should remain. sha or md could be the start of the string OR be preceeded by whitespace or an. underscore, but cannot be the end of a longer word or in the middle of a word.

Thanks

Answer

As an alternative, you might also use

(?<!d|^)(?<!(?<![^W_])(?:sha|md))(?=d)|(?<=d)(?!d|$)|_

It will either match the position between a digit and an non digit or an underscore.

In case there is a digit on the right, what comes before the digit can not be sha or md which is not preceded by any char except a word char without the underscore.

Explanation

  • (?<!d|^) If not looking back at a digit or start of string
  • (?<! If not looking back on
    • (?<![^W_]) If not looking back on a word char except an underscore
    • (?:sha|md) Match sha or md followed by an optional digit
  • ) Close lookbehind
  • (?=d) Assert a digit directly to the right
  • | Or
  • (?<=d)(?!d|$) If looking back at a digit and not looking forward to a whitespace char or end of string
  • | Or
  • _ Match an underscore

Regex demo | Java demo

Example

String strings[] = {"Aisha256", "ai_sha256", "test123", "84test", "test md5", "sha256", "word two sha1", "w0rd", "test_md5", "sha256", "md5"};
for (String str : strings){
    System.out.println(str.replaceAll("(?<!\d|^)(?<!(?<![^\W_])(?:sha|md))(?=\d)|(?<=\d)(?!\d|$)|_", " "));
}

Output

Aisha 256
ai sha256
test 123
84 test
test md5
sha256
word two sha1
w 0 rd
test md5
sha256
md5


Source: stackoverflow