Tag: document

Fetching specific fields from an S3 document

I am using AWS Java SDK in my application to talk to one of my S3 buckets which holds objects in JSON format. A document may look like this: Now, for a certain document lets say document1 I need to fetch the values corresponding to field a and b instead of fetching the entire document. This sounds like something that

How to save a Jsoup Document to an HTML file?

document java jsoup

I have used this method to retrieve a webpage into an org.jsoup.nodes.Document object: myDoc = Jsoup.connect(myURL).ignoreContentType(true).get(); How should I write this object to a HTML file? The methods myDoc.html(), myDoc.text() and myDoc.toString() don’t output all elements of the document. Some information in a javascript element can be lost in parsing it. For example, “timestamp” in the source of an Instagram

Extract the first page content from docx file by XML parsing

document domparser java xml

I need to extract the first page content from the docx file and save it as a seperate document. I need everything from the first page( images, tables, text) to be saved as it is in new docx file. What i tried is : I looked into the xml of the unzipped docx file. Since word document is reflowable i