Skip to content

AvroParquetOutputFormat – Unable to Write Arrays with Null Elements

I’m using v1.11.1 of the parquet-mr library as part of a Java application that takes Avro records and writes them into Parquet files using the AvroParquetOutputFormat. There are Avro records with array type fields that will have null elements, e.g.

[ "Foo", "Bar", null, "Baz"].

Here’s an example Avro schema:

  "type": "record",
  "name": "NullLists",
  "namespace": "com.test",
  "fields": [
      "name": "KeyID",
      "type": "string"
      "name": "NullableList",
      "type": [
            "type": "array",
            "items": [
      "default": null

I’m trying to write the following record:

  "KeyID": "0",
  "NullableList": [

I thought I could use the 3-level list writer to support this, however, it results in the following exception:

Caused by: java.lang.ClassCastException: repeated binary array (STRING) is not a group
        at org.apache.parquet.schema.Type.asGroupType(
        at org.apache.parquet.avro.AvroWriteSupport$ThreeLevelListWriter.writeCollection(
        at org.apache.parquet.avro.AvroWriteSupport$ListWriter.writeList(
        at org.apache.parquet.avro.AvroWriteSupport.writeValueWithoutConversion(
        at org.apache.parquet.avro.AvroWriteSupport.writeValue(
        at org.apache.parquet.avro.AvroWriteSupport.writeRecordFields(
        at org.apache.parquet.avro.AvroWriteSupport.write(
        at org.apache.parquet.hadoop.InternalParquetRecordWriter.write(

Is this kind of record supported? I have also tried the parquet.avro.add-list-element-records option set to false as well, with no luck.

EDIT: I have created a Parquet JIRA for tracking this issue.


For those interested – this required a patch, and it was merged to master here. Corresponding JIRA.