I have a Dataset gathering informations about French cities, and the field that is troubling me is the department one (CodeDepartement). When the Dataset isn’t partitioned by this String field codeDepartement: everything is working well When that function runs, if I don’t attempt to partition the dataset (the required statements for partitioning are commented here), everything goes fine: The content
Tag: parquet
AvroParquetOutputFormat – Unable to Write Arrays with Null Elements
I’m using v1.11.1 of the parquet-mr library as part of a Java application that takes Avro records and writes them into Parquet files using the AvroParquetOutputFormat. There are Avro records with array type fields that will have null elements, e.g. Here’s an example Avro schema: I’m trying to write the following record: I thought I could use the 3-level list
Dataflow writing a pCollection of GenericRecords to Parquet files
In apache beam step I have a PCollection of KV<String, Iterable<KV<Long, GenericRecord>>>>. I want to write all the records in the iterable to the same parquet file. My code snippet is given below now I want to write all the Records in the Iterable in the same parquet file(derive the file name by the key of KV). Answer I found
Data type mismatch while transforming data in spark dataset
I created a parquet-structure from a csv file using spark: I’m reading the parquet-structure and I’m trying to transform the data in a dataset: Unfortunately I get a data type mismatch error. Do I have to explicitly assign data types? 17/04/12 09:21:52 INFO SparkSqlParser: Parsing command: SELECT *, md5(station_id) as hashkey FROM tmpview Exception in thread “main” org.apache.spark.sql.AnalysisException: cannot resolve