I need to extract the first page content from the docx file and save it as a seperate document. I need everything from the first page( images, tables, text) to be saved as it is in new docx file.
What i tried is : I looked into the xml of the unzipped docx file. Since word document is reflowable i couldnt find a page break after each page ends. So i couldnt find the end of each page via the document.xml
Is there any way to get the XML content of the first page of the document alone using java XML DOM parser ?
Advertisement
Answer
Do not write a new parser, there are tons of already existing tools for that (e.g., what if your input changes from XML to binary Word files?).
Use Apache POI for example, as @JFB suggested.