I am trying to find all characters, which are not letters(upper/lowercase), numbers, and underscore, and remove it.
stringA.replaceAll("[^a-zA-Z0-9_]","") // works perfectly fine
However, the following code could not even compile in Java:
stringA.replaceAll("W",""); // or also stringA.replaceAll("[W]",""); // or also stringA.replaceAll("[\W]","");
If I use only "\W"
rather than "W"
, the above code turns out to be correct.
So, what is the differences between W
, \W
, and when to use brackets like [^a-zA-Z0-9_]
Advertisement
Answer
However, the following code could not even compile in Java
Java has no idea that the string is going to regex engine. Anything in doublequotes is a string literal to Java compiler, so it tries to interpret W
as a Java escape sequence, which does not exist. This trigger a compile-time error.
If I use only
\W
rather thanW
, the above code turns out to be correct.
This is because \
is a valid escape sequence, which means “a single slash”. When you put two slashes inside a string literal, Java compiler removes one slash, so regex engine sees W
, not \W
So, what is the differences between
W
,\W
, and when to use brackets like[^a-zA-Z0-9_]
The third one is a longer version of the second one; the first one does not compile.