Skip to content
Advertisement

Dataflow writing a pCollection of GenericRecords to Parquet files

In apache beam step I have a PCollection of KV<String, Iterable<KV<Long, GenericRecord>>>>. I want to write all the records in the iterable to the same parquet file. My code snippet is given below

JavaScript

now I want to write all the Records in the Iterable in the same parquet file(derive the file name by the key of KV).

Advertisement

Answer

I found out the solution to the problem. at the step –

apply(GroupByKey.create()) //PCollection<KV<String, Iterable<KV<Long, GenericRecord>>>>>

I will apply another transform that will return only the Iterable as the output pCollection. `.apply(ParDo.of(new GetIterable())) //PCollection>> where key is the name of the file I have to write to. then remaining snippet is

JavaScript
User contributions licensed under: CC BY-SA
6 People found this is helpful
Advertisement