I am using JSoup library in Java to sanitize input to prevent XSS attacks. It works well for simple inputs like alert(‘vulnerable’).
Example:
String data = "<script>alert('vulnerable')</script>"; data = Jsoup.clean(data, , Whitelist.none()); data = StringEscapeUtils.unescapeHtml4(data); //StringEscapeUtils from apache-commons lib System.out.println(data);
Output: ""
However, if I tweak the input to the following, JSoup cannot sanitize the input.
String data = "<<b>script>alert('vulnerable');<</b>/script>"; data = Jsoup.clean(data, , Whitelist.none()); data = StringEscapeUtils.unescapeHtml4(data); System.out.println(data);
Output: <script>alert('vulnerable');</script>
This output obviously still prone to XSS attacks. Is there a way to fully sanitize the input so that all HTML tags is removed from input?
Advertisement
Answer
Not sure if this is the best solution, but a temporary workaround would be parsing the raw text into a Doc
and then clean the combined text of the Doc
element and all its children:
String unsafe = "<<b>script>alert('vulnerable');<</b>/script>"; Document doc = Jsoup.parse(unsafe); String safe = Jsoup.clean(doc.text(), Whitelist.none()); System.out.println(safe);
Wait for someone else to come up with the best solution.