Skip to content

Trying to understand the saving of changes to a Word doc from Apache poi

I have a Word document (docx); I want to make changes to that document and save the result as another file, leaving the original in place. I have the following code illustrating my current problem:

package sandbox.word.doccopy;

import java.io.FileOutputStream;
import java.io.IOException;
import java.util.List;

import org.apache.poi.openxml4j.exceptions.InvalidFormatException;
import org.apache.poi.openxml4j.opc.OPCPackage;
import org.apache.poi.xwpf.usermodel.XWPFDocument;
import org.apache.poi.xwpf.usermodel.XWPFParagraph;
import org.apache.poi.xwpf.usermodel.XWPFRun;

public class CopyTest
{
  public static void main(String[] args) throws Exception
  {
    String sourceFilename      = "CopyTestSource.docx";
    String destinationFilename = "CopyTestResult.docx";
    
    CopyTest docCopy = new CopyTest();
    docCopy.copyTesting(sourceFilename, destinationFilename);
    System.out.println("done");
  }
  
  public void copyTesting(String source, String destination)
      throws IOException, InvalidFormatException
  {
    XWPFDocument doc = new XWPFDocument(OPCPackage.open(source));
    // for each paragraph that has runs, 
    // put an exclamation at the end of the first run.
    for (XWPFParagraph par : doc.getParagraphs())
    {
      List<XWPFRun> runs = par.getRuns();
      if (runs.size() > 0) 
      { XWPFRun run = par.getRuns().get(0);
        String text = run.getText(0);
        text = text + "!";
        run.setText(text, 0);
      }
    }
    
//    FileOutputStream fos = new FileOutputStream(destination);
//    doc.write(fos);
//    fos.close();
    doc.close();
  }
  
}

There are three ways I’ve run this, changing commented lines at the bottom of the class file. As you see, there are three lines that create a file output stream with the destination filename, write to it, and close it, and one line that just closes the current document.

If I comment out the 3 lines and leave the 1 line, no changes are written to the current document (and, of course, the copy document is not created).

If I leave all 4 lines uncommented, the copy document is created with changes, and the changes are also written to the source document.

If I comment out the 4th line, I get a destination document with changes, and the source document is left unchanged.

The last one is what I want, I can write my code to do that. But I would expect that closing the document after it is changed would either change it or not change it, and that changing it wouldn’t depend on whether I had written the changes to another file.

Can anyone shed any light on this?

Answer

The culprit is this: XWPFDocument doc = new XWPFDocument(OPCPackage.open(source));. And specially this: OPCPackage.open(source).

While static OPCPackage open(java.lang.String path) the OPCPackage gets opened from the underlying file of file path path with read/write permission. Additional it stays directly connected to the underlying file. This saves some memory but has disadvantages too, as you will see now.

All changes in XWPFDocument are made in that OPCPackage but in random access memory first.

While calling doc.write, which calls POIXMLDocument.write(java.io.OutputStream stream), at first the underlying OPCPackage gets updated. Then the changed OPCPackage gets saved in the destination document through the given OutputStream stream. So without calling doc.write nothing gets changed in files but stays in random access memory only.

Then while doc.close() gets called also OPCPackage.close gets called. This closes the open, writable package and saves its content. And since the OPCPackage is directly connected to the underlying file, it saves the content into that file. That’s why the changes are also written to the source document.

This should explain your observations.

The XWPFDocument also provides constructor XWPFDocument(java.io.InputStream is). This internally calls OPCPackage.open(java.io.InputStream in). And this opens the OPCPackage from the InputStream. The OPCPackage then is in random access memory only and is independent form the source file. That uses some more memory as the whole OPCPackage needs to be in random access memory but OPCPackage.close will not lead to changes in source file.

So what I would do is:

...
XWPFDocument doc = new XWPFDocument(new FileInputStream(source));
...
FileOutputStream fos = new FileOutputStream(destination);
doc.write(fos);
fos.close();
doc.close();
...