I want to clean user input for help preventing XSS attacks and we don’t necessarily care to have a HTML whitelist, as our users shouldn’t need to post any HTML / CSS.
Eyeing the alternatives out there, which would be better? [Apache Commons Text’s StringEscapeUtils] [1] or [JSoup Cleaner][2]?
Thanks!
Update:
I went with JSoup after writing some unit tests for both it and Apache Commons Text.
I like how JSoup won’t mess with single quotation marks (i.e. “Alan’s mom” isn’t unchanged, whereas Apache Commons Text turns it into “Alan’s mom”).
And the whitelist wasn’t a problem at all. It didn’t require any configuration, rather, they have some built-in options included which may come in handy if we choose to allow some subsets of HTML tags. [1]: https://commons.apache.org/proper/commons-text/apidocs/org/apache/commons/text/StringEscapeUtils.html [2]: http://jsoup.org/cookbook/cleaning-html/whitelist-sanitizer
Advertisement
Answer
“Better”? I don’t think it matters. Cleaner has a Whitelist.none(), escape utils will escape everything.
It depends on how you want the “cleaned” input to render: do you want just the text nodes, or do you want the escaped HTML to show up?