Java XSS Sanitization for nested HTML elements



I am using JSoup library in Java to sanitize input to prevent XSS attacks. It works well for simple inputs like alert(‘vulnerable’).

Example:

String data = "<script>alert('vulnerable')</script>";
data = Jsoup.clean(data, , Whitelist.none());
data = StringEscapeUtils.unescapeHtml4(data); //StringEscapeUtils from apache-commons lib
System.out.println(data);

Output: ""

However, if I tweak the input to the following, JSoup cannot sanitize the input.

String data = "<<b>script>alert('vulnerable');<</b>/script>";
data = Jsoup.clean(data, , Whitelist.none());
data = StringEscapeUtils.unescapeHtml4(data);
System.out.println(data);

Output: <script>alert('vulnerable');</script>

This output obviously still prone to XSS attacks. Is there a way to fully sanitize the input so that all HTML tags is removed from input?

Answer

Not sure if this is the best solution, but a temporary workaround would be parsing the raw text into a Doc and then clean the combined text of the Doc element and all its children:

String unsafe = "<<b>script>alert('vulnerable');<</b>/script>";
Document doc = Jsoup.parse(unsafe);
String safe = Jsoup.clean(doc.text(), Whitelist.none());
System.out.println(safe);

Wait for someone else to come up with the best solution.



Source: stackoverflow