I have an org.w3c.dom.Document
and want to serialize it with this function, but I get an SAXException
. How could I fix this?
public static String serializeXmlDocument(Document document) throws Exception { // set up a transformer TransformerFactory transformerFactory = TransformerFactory.newInstance(); Transformer trans = transformerFactory.newTransformer(); trans.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes"); trans.setOutputProperty(OutputKeys.ENCODING, "UTF-8"); trans.setOutputProperty(OutputKeys.INDENT, "yes"); DOMSource source = new DOMSource(document); // create string from xml tree StringWriter stringWriter = new StringWriter(); StreamResult stringResult = new StreamResult(stringWriter); trans.transform(source, stringResult); return stringWriter.toString(); }
This results in the following error:
2014-07-20 03:03:36,451 ERROR [XXX] XXX main job error: javax.xml.transform.TransformerException: org.xml.sax.SAXException: E/A-Fehler java.io.IOException: Ungültige UTF-16-Ersetzung festgestellt: d835 20 ? at com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl.transform(TransformerImpl.java:758) at com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl.transform(TransformerImpl.java:359) at mypackage.handler.XmlHandler.serializeXmlDocument(XmlHandler.java:226) at mypackage.subpackage.buildSolrXml(MyJob.java:213) at mypackage.subpackage.doJob(MyJob.java:113) at mypackage.MyWorkstation.main(MyWorkstation.java:27) Caused by: org.xml.sax.SAXException: E/A-Fehler java.io.IOException: Ungültige UTF-16-Ersetzung festgestellt: d835 20 ? at com.sun.org.apache.xml.internal.serializer.ToStream.cdata(ToStream.java:1290) at com.sun.org.apache.xml.internal.serializer.ToStream.characters(ToStream.java:1395) at com.sun.org.apache.xml.internal.serializer.ToUnknownStream.characters(ToUnknownStream.java:814) at com.sun.org.apache.xml.internal.serializer.ToUnknownStream.characters(ToUnknownStream.java:348) at com.sun.org.apache.xalan.internal.xsltc.trax.DOM2TO.parse(DOM2TO.java:122) at com.sun.org.apache.xalan.internal.xsltc.trax.DOM2TO.parse(DOM2TO.java:230) at com.sun.org.apache.xalan.internal.xsltc.trax.DOM2TO.parse(DOM2TO.java:230) at com.sun.org.apache.xalan.internal.xsltc.trax.DOM2TO.parse(DOM2TO.java:230) at com.sun.org.apache.xalan.internal.xsltc.trax.DOM2TO.parse(DOM2TO.java:136) at com.sun.org.apache.xalan.internal.xsltc.trax.DOM2TO.parse(DOM2TO.java:98) at com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl.transformIdentity(TransformerImpl.java:702) at com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl.transform(TransformerImpl.java:746) ... 5 more Caused by: java.io.IOException: Ungültige UTF-16-Ersetzung festgestellt: d835 20 ? at com.sun.org.apache.xml.internal.serializer.ToStream.writeUTF16Surrogate(ToStream.java:973) at com.sun.org.apache.xml.internal.serializer.ToStream.writeNormalizedChars(ToStream.java:1110) at com.sun.org.apache.xml.internal.serializer.ToStream.cdata(ToStream.java:1267) ... 16 more
Advertisement
Answer
The Document contained invalid Unicode characters like
http://www.fileformat.info/info/unicode/char/d835/index.htm
I fixed it with the solution from removing invalid XML characters from a string in java
// remove illegal unicode characters String xml10pattern = "[^" + "u0009rn" + "u0020-uD7FF" + "uE000-uFFFD" + "ud800udc00-udbffudfff" + "]"; stringValue = stringValue.replaceAll(xml10pattern, " ");