Skip to content
Advertisement

Tag: parquet

A parquet file of a dataset having a String field containing leading zeroes returns that field without leading zeroes, if it is paritionned by it

I have a Dataset gathering informations about French cities, and the field that is troubling me is the department one (CodeDepartement). When the Dataset isn’t partitioned by this String field codeDepartement: everything is working well When that function runs, if I don’t attempt to partition the dataset (the required statements for partitioning are commented here), everything goes fine: The content

AvroParquetOutputFormat – Unable to Write Arrays with Null Elements

I’m using v1.11.1 of the parquet-mr library as part of a Java application that takes Avro records and writes them into Parquet files using the AvroParquetOutputFormat. There are Avro records with array type fields that will have null elements, e.g. Here’s an example Avro schema: I’m trying to write the following record: I thought I could use the 3-level list

Data type mismatch while transforming data in spark dataset

I created a parquet-structure from a csv file using spark: I’m reading the parquet-structure and I’m trying to transform the data in a dataset: Unfortunately I get a data type mismatch error. Do I have to explicitly assign data types? 17/04/12 09:21:52 INFO SparkSqlParser: Parsing command: SELECT *, md5(station_id) as hashkey FROM tmpview Exception in thread “main” org.apache.spark.sql.AnalysisException: cannot resolve

Advertisement