Skip to content
Advertisement

Java Apache POI Bug

While working with the Apache POI Implementation I ran into a strange behaviour. I cannot explain the cause, so if anybody can give some hints, I would love to hear them. It is not even a big blocker for the problem that I was solving – at this point it is more a curiosity thing. So here it goes:

public static void main(String[] args) throws EcatException, SQLException, IOException, Exception {

    long ts = System.currentTimeMillis();
    SXSSFWorkbook wb = new SXSSFWorkbook();
    SXSSFSheet test = wb.createSheet("Test");
    SXSSFRow r = test.createRow(0);
    Cell c = r.createCell(0);
    c.setCellValue("TEST");

    wb.write(new FileOutputStream("D:/wb-" + ts + ".xlsx"));
    wb.close();

    XSSFWorkbook wb2 = new XSSFWorkbook("D:/wb-" + ts + ".xlsx");
    XSSFSheet s = wb2.getSheet("Test");

    s.getRow(0).getCell(0).setCellType(CellType.STRING);
    System.out.println(s.getRow(0).getCell(0).getStringCellValue());
    wb2.close();

  }

As you can see, this will create a SXSSFWorkbook with one row and one cell with the value “TEST”. Then opening the workbook againg, and print the content of that one cell to the console.

My expectation is to see “TEST” on the console, but I do not. The output is empty.

  • If I remove the line
s.getRow(0).getCell(0).setCellType(CellType.STRING);

the output is as expected.

  • If I switch from using a SXSSFWorkbook to XSSFWorkbook the output is as expected.

  • And most curious, if I open the resulting xlsx file, save it and close it again, then running the read part of the above code, the output is as expected.

Is that something that someone has an explanation for? Btw. I tried with different version of POI, it had the same results everytime.

Advertisement

Answer

The problem is that SXSSFWorkbook uses inline strings per default because this is better for the streaming approach. But XSSFWorkbook expects strings to be stored in a shared strings table when cell type is CellType.STRING.

So after creating the SXSSFWorkbook your cell XML in sheet1.xml looks like

<c r="A1" t="inlineStr">
 <is>
  <t>TEST</t>
 </is>
</c>

The type t is inlineStr. And the cell value is the string TEST directly.

But after Cell.setCellType(CellType.STRING) the type t is set s. And this expects the value to be a number which is the index of the string in the shared strings table. But there is not a such. That’s why System.out.println(s.getRow(0).getCell(0).getStringCellValue()); cannot print anything.

You could do SXSSFWorkbook wb = new SXSSFWorkbook(null, 100, true, true) to force the SXSSFWorkbookto use shared strings table too. But that will cost performance in streaming approach because then all strings needs to be stored in that shared strings table instead of storing them directly in the cells.

The benefit of the shared strings table is save memory because all strings only are stored once there and only their indexes are stored in the cells when multiple cells use the same string.

Excel itself never stores workbooks using inline strings instead of using shared strings table. So after opening and resaving in Excel the inline strings are replaced by indexes to strings in the shared strings table and cell types always are s instead of inlineStr. That’s why Cell.setCellType(CellType.STRING) will not have that effect anymore.

Advertisement