Skip to content

Send pdf instead of TextSnippet in goole automl enrity extraction

I have created a custom processor using google AutoML entity extractor and trained few pdfs. The Pdf’s actually contains Photo identity card. I was able to test it in their UI and it was able to extract the entity properly. Now Im using their Java client library to do it using code given below. Here is the sample

https://github.com/googleapis/java-automl/blob/b4c760c01efbd2174d93af85c5fbab3c09eee9f2/samples/snippets/src/main/java/com/example/automl/LanguageEntityExtractionPredict.java

Here I see that they pass the text content into the library instead I want to send the PDF content. I don’t want to use the google cloud storage bucket instead I want to load file locally and sent it to the entity extractor. I tried using the Document class as below

Document.parseDelimitedFrom(FileInputStream("test.pdf")) but it gives me an error.

Any help is highly appriciated.

Answer

Document.parseDelimitedFrom(FileInputStream("test.pdf")) throws an error because the parseDelimitedFrom() method expects a protobuf message for parsing not the InputStream of the local PDF file. That being said, currently, there is no provision to send local files for prediction as seen in this REST API documentation. The DocumentInputConfig parameter supports only GCS source.


Feature Request

I have raised this requirement as a feature request in Google’s Issue Tracker. The issue can be found hereIssue #218865096. You can STAR the issue to receive automatic updates and give it traction by referring to this link. Also, please be reminded that there is no timeline nor implementation guarantee for feature requests. All communication regarding this feature request will be done on the Issue Tracker.