Skip to content
Advertisement

PDFBox search for text on specific page in new PDF

I’m searching a way to check my new PDF for a specific String on every page. The idea is to go on every page and if project name is missing from the page to add it (before saving the pdf – doc.save(new FileOutputStream(new File(pathToFile)));

I already tried:

document.save(new FileOutputStream(new File(pathToFile)));

PDDocument document = PDDocument.load(new File(pathToFile));

 for (int i = 1; i < document.getNumberOfPages(); i++) {
            PDFTextStripper reader = new PDFTextStripper();
            reader.setStartPage(i);
            reader.setEndPage(i);
            String pageText = reader.getText(document);
            System.out.println(pageText);
        }

the result is : Hello World which is ok.

but this is working only if document is already saved and then load it again.

In my case when the document is not saved yet:

 for (int i = 1; i < document.getNumberOfPages(); i++) {
                PDFTextStripper reader = new PDFTextStripper();
                reader.setStartPage(i);
                reader.setEndPage(i);
                String pageText = reader.getText(document);
                System.out.println(pageText);
            }

the result is empty String

Advertisement

Answer

Obviasly there is no way to find text before saving the document so I started a new approach.

oldPagesCount = document.getNumberOfPages();
addTableInformation(List<String> informationToAdd);
if (oldPagesCount < document.getNumberOfPages()) {
        // we have auto generated pages and we should add projec name-number
        for (int i = oldPagesCount; i < document.getNumberOfPages(); i++) {
        page = document.getPage(i);
        }
        addProjectInfo(project);
    }
  }

In this case if table info is moved to multiple pages the code is going on every newly added page and adding project information. Hope that this will help to everybody that need to do something like this.

User contributions licensed under: CC BY-SA
6 People found this is helpful
Advertisement