I have some troubles running a Spark Application that reads data from Cassandra in Spark 2.0.0. My code work as follow : DataFrameReader readerCassandra = SparkContextUtil.getInstance().read() …
Tag: apache-spark
Apache Spark Streaming with Java & Kafka
I’m trying to run Spark Streaming example from the official Spark website Those are the dependencies I use in my pom file: This is my Java code: When I try to run it from Eclipse I get following exception: I run this from my IDE (eclipse). Do I have to create and deploy the JAR into spark to make it
How to specify different user library path for different actions in an oozie workflow
How to specify different user library path for different actions in an oozie workflow I have a spark action and a java action How can I specify different library paths for two actions. I have conflicitng jars in these two assembly jars. Answer Making the action s sub-workflow helps resolves the jar incosistencies
Data type mismatch while transforming data in spark dataset
I created a parquet-structure from a csv file using spark: I’m reading the parquet-structure and I’m trying to transform the data in a dataset: Unfortunately I get a data type mismatch error. Do I have to explicitly assign data types? 17/04/12 09:21:52 INFO SparkSqlParser: Parsing command: SELECT *, md5(station_id) as hashkey FROM tmpview Exception in thread “main” org.apache.spark.sql.AnalysisException: cannot resolve
How to use join with gt condition in Java?
I want to join two dataframes based on the following condition: if df1.col(“name”)== df2.col(“name”) and df1.col(“starttime”) is greater than df2.col(“starttime”). the first part of the condition is ok, I use “equal” method of the column class in spark sql, but for the “greater than” condition, when I use the following syntax in java”: It does not work, It seems “gt”
NoSuchMethodError in shapeless seen only in Spark
I am trying to write a Spark connector to pull AVRO messages off a RabbitMQ message queue. When decoding the AVRO messages, there is a NoSuchMethodError error that occurs only when running in Spark. I could not reproduce the Spark code exactly outside of spark, but I believe the two examples are sufficiently similar. I think this is the smallest
How convert JavaRDD to JavaRDD<List>?
I try make it with use this code, but I get WrappedArray How make it correctly? Answer You can use getList method: where lemmas is the name of the column with lemmatized text. If there is only one column (it looks like this is the case) you can skip select. If you know the index of the column you can
Spark SASL not working on the emr with yarn
So first, I want to say the only thing I have seen address this issue is here: Spark 1.6.1 SASL. However, when adding the configuration for the spark and yarn authentication, it is still not working. …
Spark – Divide int with column?
I’m trying to divide a constant with a column. I know I can do but how can I do (90).divide(df.col(“col1”)) (obviously this is incorrect). Thank you! Answer Use o.a.s.sql.functions.lit: or o.a.s.sql.functions.expr:
Spark (JAVA) – dataframe groupBy with multiple aggregations?
I’m trying to write a groupBy on Spark with JAVA. In SQL this would look like But what is the Spark/JAVA style equivalent of this query? Let’s say the variable table is a dataframe, to see the relation to the SQL query. I’m thinking something like: Which is obviously incorrect, since you can’t use aggregate functions like .count or .max