Tag: apache-spark-sql

Using createOrReplaceTempView to replace a temp view not working as expected

I have a dataset something similar to this My spark code is I am trying to replace the people view by calling createOrReplaceTempView but i get the following error as below How do I replace the view in spark? Answer So I got the solution to the above question by the following code

spark java how to select newly added column using withcolumn

apache-spark apache-spark-sql java pyspark

I am trying to create java spark program and trying to add anew column using and when I am trying to select with its saying Cannot reslove column name newColumn. Can some one please help me how to do this in Java? Answer qdf is the dataframe before you added the newColumn which is why you are unable to select

Compare schema of dataframe with schema of other dataframe

apache-spark apache-spark-sql dataframe java scala

I have schema from two dataset read from hdfs path and it is defined below: val df = spark.read.parquet(“/path”) df.printSchema() Answer Since your schema file seems like a CSV : use isSchemaMatching for further logic

Spark – Transforming Complex Data Types

apache-spark apache-spark-sql java user-defined-functions

Goal The goal I want to achieve is to read a CSV file (OK) encode it to Dataset<Person>, where Person object has a nested object Address[]. (Throws an exception) The Person CSV file In a file called person.csv, there is the following data describing some persons: The first line is the schema and address is a nested structure. Data classes

Data type mismatch while transforming data in spark dataset

apache-spark apache-spark-dataset apache-spark-sql java parquet

I created a parquet-structure from a csv file using spark: I’m reading the parquet-structure and I’m trying to transform the data in a dataset: Unfortunately I get a data type mismatch error. Do I have to explicitly assign data types? 17/04/12 09:21:52 INFO SparkSqlParser: Parsing command: SELECT *, md5(station_id) as hashkey FROM tmpview Exception in thread “main” org.apache.spark.sql.AnalysisException: cannot resolve

How to use join with gt condition in Java?

apache-spark apache-spark-sql java

I want to join two dataframes based on the following condition: if df1.col(“name”)== df2.col(“name”) and df1.col(“starttime”) is greater than df2.col(“starttime”). the first part of the condition is ok, I use “equal” method of the column class in spark sql, but for the “greater than” condition, when I use the following syntax in java”: It does not work, It seems “gt”

Spark – Divide int with column?

apache-spark apache-spark-sql dataframe java

I’m trying to divide a constant with a column. I know I can do but how can I do (90).divide(df.col(“col1”)) (obviously this is incorrect). Thank you! Answer Use o.a.s.sql.functions.lit: or o.a.s.sql.functions.expr: