Skip to content
Advertisement

Differences between W, \W, [^a-zA-Z0-9_] in regular expression

I am trying to find all characters, which are not letters(upper/lowercase), numbers, and underscore, and remove it.

stringA.replaceAll("[^a-zA-Z0-9_]","")   // works perfectly fine

However, the following code could not even compile in Java:

stringA.replaceAll("W","");
// or also
stringA.replaceAll("[W]","");
// or also
stringA.replaceAll("[\W]","");

If I use only "\W" rather than "W", the above code turns out to be correct.
So, what is the differences between W, \W, and when to use brackets like [^a-zA-Z0-9_]

Advertisement

Answer

However, the following code could not even compile in Java

Java has no idea that the string is going to regex engine. Anything in doublequotes is a string literal to Java compiler, so it tries to interpret W as a Java escape sequence, which does not exist. This trigger a compile-time error.

If I use only \W rather than W, the above code turns out to be correct.

This is because \ is a valid escape sequence, which means “a single slash”. When you put two slashes inside a string literal, Java compiler removes one slash, so regex engine sees W, not \W

So, what is the differences between W, \W, and when to use brackets like [^a-zA-Z0-9_]

The third one is a longer version of the second one; the first one does not compile.

User contributions licensed under: CC BY-SA
2 People found this is helpful
Advertisement