Skip to content
Advertisement

Java and org.apache.poi while reading excel line 47 is skipped

Using Java 1.8, org.apache.poi 5.1.0 and org.apache.poi.ooxml 5.1.0. I have an excel file that consist of 54 rows. I read through this file in blocks of 5 lines. If I get to line 47 it skips that line and gives me the first line of the new block while it should give me the first empty line above the now block.

Using the debugger I can see it go from line 46 to line 48 while I would expect line 47. Add a breakpoint at line 51 (See the comment in the java code for the location). And you can see how currentRow attribute ‘r’ skips from line 46 to 48.

I do not know why this happens but it is ruining my day and renders my program useless.

Below you can find my files. I brought it down to the bare minimum while still making the error reproducible.

My build.gradle file

plugins {
    id 'java'
    id 'application'
}

group 'nl.karnhuis'

sourceCompatibility = 1.8

application {
    mainClass = 'nl.karnhuis.test.Testfile'
}

repositories {
    mavenCentral()
    maven {
        url "https://mvnrepository.com/artifact"
    }
}

dependencies {
    implementation 'org.apache.poi:poi:5.1.0'
    implementation 'org.apache.poi:poi-ooxml:5.1.0'
}

my gradle.settings file

rootProject.name = 'testfile'

My java code

package nl.karnhuis.test;

import java.io.*;
import java.util.*;

import org.apache.poi.openxml4j.exceptions.InvalidFormatException;
import org.apache.poi.ss.usermodel.*;
import org.apache.poi.xssf.usermodel.*;

public class Testfile {

    public void run() {
        File inputFile = new File("schema.xlsx");
        handleFile(inputFile);
    }

    private void handleFile(File inputFile) {
        try {
            // Create Workbook instance holding reference to .xlsx file
            XSSFWorkbook workbook = new XSSFWorkbook(inputFile);

            // Get first/desired sheet from the workbook
            Sheet datatypeSheet = workbook.getSheetAt(0);
            Iterator<Row> iterator = datatypeSheet.iterator();
            Row currentRow = null;

            // Go over all rows
            while (iterator.hasNext()) {

                if (checkForLastLine(currentRow)) {
                    break;
                }

                currentRow = iterator.next();
                // First two rows can be skipped.
                if ((currentRow.getRowNum()) < 2) {
                    continue;
                }

                currentRow = iterator.next();
                // do something important

                currentRow = iterator.next();
                // do something important

                currentRow = iterator.next();
                // do something important

                // The next row is empty, so it can be skipped.
                currentRow = iterator.next();
                System.out.println(currentRow.getRowNum()); //Add breakpoint here 
            }

        } catch (IOException | InvalidFormatException e) {
            e.printStackTrace();
        }
    }

    private boolean checkForLastLine(Row currentRow) {
        if (currentRow == null) {
            return false;
        } else {
            for (Cell currentCell : currentRow) {
                // Reached end of file? Get out of da loop!
                return currentCell.getColumnIndex() == 0
                        && (currentCell.getStringCellValue().trim().startsWith("primaire")
                        || currentCell.getStringCellValue().trim().startsWith("secondaire"));
            }
        }
        return false;
    }

    public static void main(String[] args) {
        Testfile mc = new Testfile();
        mc.run();
    }
}

The excel file can be downloaded from https://www.karnhuis.nl/schema.xlsx

Advertisement

Answer

It seems that the empty rows in the Excel weren’t created the same way. Try writing something in first cell of row 47 and running again. The row will be listed correctly in your class. Even after deleting the content and having an empty row again, it will work.

Apache POI has the notion of logical rows (that have or previously had content) and won’t return lines that were always empty. If you don’t have control over how the Excel files are generated, don’t use counting rows. You could for example look for text in first column and then count 4 lines.

User contributions licensed under: CC BY-SA
2 People found this is helpful
Advertisement