How to convert a PDF to a JSON/EXCEL/WORD file?

Question

I need to get data from the pdf file with its header for further comparing with DB data I tried to use the pdfbox , google vision ocr , itext, but all libraries gave me a row without structure and headers. Example: DatenNumbernStatusn12122020n442334delivered I will trying convert pdf to excel/word and get data from them, but for this realisation i

Accepted Answer

I not found answer on my question.I&#8217;m use this code for my task :import org.apache.pdfbox.pdmodel.PDDocument;import org.apache.pdfbox.pdmodel.PDPage;import org.apache.pdfbox.text.PDFTextStripperByArea;import java.awt.*;import java.io.File;import java.io.IOException;public class ExtractTextByArea {    public String getTextFromCoordinate(String filepath,int x,int y,int width,int height) {        String result = "";        try (PDDocument document = PDDocument.load(new File(filepath))) {            if (!document.isEncrypted()) {                PDFTextStripperByArea stripper = new PDFTextStripperByArea();                stripper.setSortByPosition(true);               // Rectangle rect = new Rectangle(260, 35, 70, 10);                Rectangle rect = new Rectangle(x,y,width,height);                stripper.addRegion("class1", rect);                PDPage firstPage = document.getPage(0);                stripper.extractRegions( firstPage );               // System.out.println("Text in the area:" + rect);                result = stripper.getTextForRegion("class1");            }        } catch (IOException e){            System.err.println("Exception while trying to read pdf document - " + e);        }        return result;    }}

Advertisement

Answer