Read a word (.docx) file in java

Question

I have a word document which was generated with docx4j, when i unzip the docx file, the contents of folder is the contents of ./word/document.xml is as below the relationship xml has below relationship when we unzip chunk.docx it has below file contents and the ./word/document.xml has below contents relationship document xml has below contents similarly when i unzip the

Accepted Answer

Your docx contains altChunks of type docx.It contains those because that would&#8217;ve been done explicitly when whoever created it did so using docx4j, using code such as https://github.com/plutext/docx4j/blob/VERSION_11_4_7/docx4j-samples-docx4j/src/main/java/org/docx4j/samples/AltChunkAddOfTypeDocx.javaOrdinarily you wouldn&#8217;t do that.Generally, if you want to handle such a docx using approaches like XPath, you&#8217;d first convert those altChunks into normal content.  Word can do this, as can Docx4j Enterprise.But if you control the generating application, the best approach would be to revisit it, changing it so it doesn&#8217;t create altChunks.  At least understand why they wrote it that way.

Advertisement

Answer