Skip to content
Advertisement

Tag: pdf-parsing

Extract all text with string positions from a PDF

This may seem an old question, but I didn’t find an exhaustive answer after spending half an hour searching all over SO. I am using PDFBox and I would like to extract all of the text from a PDF file along with the coordinates of each string. I am using their PrintTextLocations example (http://pdfbox.apache.org/apidocs/org/apache/pdfbox/examples/util/PrintTextLocations.html) but with the kind of pdf

Advertisement