I am adding Apache Tika for extracting text out of documents and images (with TikaOcr) to an already existing service in the Azure Functions based on top of AppService. Now, Apache Tika requires tesseract to be installed in the machine locally. To overcome that, I used apt-get to set up (by ssh-ing) into the server but (from what I understand)
Tag: apache-tika
Apache Solr – Indexing ZIP files
My web app is an e-mail service. It stores email messages in MySQL database and email attachments are on a disk. The database is similar to: I index it with the following data-config.xml: This is working good with all the files except compressed files such as .zip. For .zip files the attach_content field gets filled only with the file names
Read Content from Files which are inside Zip file
I am trying to create a simple java program which reads and extracts the content from the file(s) inside zip file. Zip file contains 3 files (txt, pdf, docx). I need to read the contents of all these files and I am using Apache Tika for this purpose. Can somebody help me out here to achieve the functionality. I have