How to search for a string in a pdf document [closed]

I have a pdf document which contains images, hyperlinks , words and many other things.

I want to search for a sting in all the words, i.e images and hyperlinks are excluded. How to write a java code with that. Could someone help here.


You can use the PDFbox library of Apache ( Here is an example of code.

import java.util.Scanner;
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.text.PDFTextStripper;
public class Main {
    public static void main(String args[]) throws IOException {
        Scanner scan = new Scanner(;
        System.out.println("Type the directory of the PDF File : ");
        String PDFdir = scan.nextLine();
        System.out.println("Input the phrase to find");
        String phrase = scan.nextLine();
        File file = new File(PDFdir);
        PDDocument doc = PDDocument.load(file);
        PDFTextStripper findPhrase = new PDFTextStripper();
        String text = findPhrase.getText(doc);
        String PDF_content = text;
        String result = PDF_content.contains(phrase) ? "Yes" : "No"

Remember you will have to download PDFbox jar file and import it into your project.

Output/Result :

enter image description here


You can also find the number of phrases in the PDF :

if (result.equals("Yes")) {
    int counter = 0;
        while(PDF_content.contains(phrase)) {
            PDF_content = PDF_content.replaceFirst(phrase, "");

Output/Result : enter image description here

Source: stackoverflow