Extract the first page content from docx file by XML parsing

Question

I need to extract the first page content from the docx file and save it as a seperate document. I need everything from the first page( images, tables, text) to be saved as it is in new docx file. What i tried is : I looked into the xml of the unzipped docx file. Since word document is reflowable i

Accepted Answer

Do not write a new parser, there are tons of already existing tools for that (e.g., what if your input changes from XML to binary Word files?).Use Apache POI for example, as @JFB suggested.

Advertisement

Answer