Replace ASCII codes and HTML tags in Java

Question

How can i achieve below expecting results without using StringEscapeUtils ? Current Results: Expecting Results: Already checked: How to unescape HTML character entities in Java? PS: This is just a sample example, input may vary. Answer Your regexp is for html tags <something> would be matched byt the ht…

Accepted Answer

Your regexp is for html tags would be matched byt the html entities will not be matched. Their pattern is something like &.*?; Which you are not replacing.this should solve your trouble:str = str.replaceAll("\<.*?\>|&.*?;", "");If you want to experiment with this in a sandbox, try regxr.com and use (<.*?>)|(&.*?;) the brackets make the two different capturing groups easy to identify on the tool and are not needed in your code. note that the does not need to be escaped on that sandbox playground, but it has to be in your code, since it’s in a string.

Advertisement

Answer