How can tabula (JAR) be called from Java?

Question

Tabula looks like a great tool for extracting tabular data from PDFs. There are plenty of examples of how to call it from the command line or use it in Python but there doesn&#8217;t seem to be any documentation for use in Java. Does anyone have a worked example? Note, tabula does provide source code but it s…

Accepted Answer

you can use the following code to call tabula from java, hope this helps public static void main(String[] args) throws IOException { final String FILENAME="../test.pdf"; PDDocument pd = PDDocument.load(new File(FILENAME)); int totalPages = pd.getNumberOfPages(); System.out.println("Total Pages in Document: "+totalPages); ObjectExtractor oe = new ObjectExtractor(pd); SpreadsheetExtractionAlgorithm sea = new SpreadsheetExtractionAlgorithm(); Page page = oe.extract(1); // extract text from the table after detecting List table = sea.extract(page); for(Table tables: table) { List> rows = tables.getRows(); for(int i=0; i cells = rows.get(i); for(int j=0; j

Advertisement

Answer