Skip to content
Advertisement

Accessing a COSArray for PDF fields with Apache PDFBox

I’m trying to access all form fields in a PDF file – so I can use code to fill them in – and this is as far as I’ve gotten:

PDDocumentCatalog pdCatalog = pdf.getDocumentCatalog();
PDAcroForm pdAcroForm = pdCatalog.getAcroForm();

List<PDField> fieldList = pdAcroForm.getFields(); // fieldList.size() = 1

PDField field = fieldList.get(0);

COSDictionary dictionary = field.getCOSObject();
System.out.println("dictionary size = " + dictionary.size());

// my attempt to iterate through fields
for ( Map.Entry<COSName,COSBase> entry : dictionary.entrySet() )
{
    COSName key = entry.getKey();
    COSBase val = entry.getValue();

    if ( val instanceof COSArray )
    {
        System.out.println("COSArray size = " + ((COSArray)val).size());
    }
    System.out.println("key = " + key);
    System.out.println("val = " + val);
}

which gives an output of:

dictionary size = 3
COSArray size = 2
key = COSName{Kids}
val = COSArray{[COSObject{110, 0}, COSObject{108, 0}]}
key = COSName{T}
val = COSString{form1[0]}
key = COSName{V}
val = COSString{}

Does anyone know how I can access the two COSObjects in the COSArray? I also don’t know what the notation COSObject{x, y} means, and can’t find any documentation on this. If those are dictionary or array values elements, I also want to know how to access those.

Advertisement

Answer

You get the object with get(index) to get the COSObject (an indirect reference) or getObject(index) to get the dereferenced object referenced by the COSObject.

COSObject{110, 0} is the object number and the generation number (usually 0). Open your PDF file with NOTEPAD++ and look for “110 0 obj” to find it, or “110 0 R” to see who references this object.

Advertisement