How to search for a string in a pdf document [closed]

Question

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers. We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations. Closed 2 years ago. Improve this question I have a pdf document which contains images, hyperlinks , words

Accepted Answer

You can use the PDFbox library of Apache (https://pdfbox.apache.org/download.cgi).Here is an example of code.import java.util.Scanner;import java.io.File;import java.io.IOException;import org.apache.pdfbox.pdmodel.PDDocument;import org.apache.pdfbox.text.PDFTextStripper;public class Main {    public static void main(String args[]) throws IOException {        Scanner scan = new Scanner(System.in);        System.out.println("Type the directory of the PDF File : ");        String PDFdir = scan.nextLine();        System.out.println("Input the phrase to find");        String phrase = scan.nextLine();        File file = new File(PDFdir);        PDDocument doc = PDDocument.load(file);        PDFTextStripper findPhrase = new PDFTextStripper();        String text = findPhrase.getText(doc);        String PDF_content = text;        String result = PDF_content.contains(phrase) ? "Yes" : "No"        System.out.println(result);        doc.close();    }}Remember you will have to download PDFbox jar file and import it into your project.Output/Result :Edit:You can also find the number of phrases in the PDF :if (result.equals("Yes")) {    int counter = 0;        while(PDF_content.contains(phrase)) {            counter++;            PDF_content = PDF_content.replaceFirst(phrase, "");        }    System.out.println(counter);}Output/Result :

Advertisement

Answer