Skip to content
Advertisement

Tag: apache-spark

Apache Spark Streaming with Java & Kafka

I’m trying to run Spark Streaming example from the official Spark website Those are the dependencies I use in my pom file: This is my Java code: When I try to run it from Eclipse I get following exception: I run this from my IDE (eclipse). Do I have to create and deploy the JAR into spark to make it

Data type mismatch while transforming data in spark dataset

I created a parquet-structure from a csv file using spark: I’m reading the parquet-structure and I’m trying to transform the data in a dataset: Unfortunately I get a data type mismatch error. Do I have to explicitly assign data types? 17/04/12 09:21:52 INFO SparkSqlParser: Parsing command: SELECT *, md5(station_id) as hashkey FROM tmpview Exception in thread “main” org.apache.spark.sql.AnalysisException: cannot resolve

How to use join with gt condition in Java?

I want to join two dataframes based on the following condition: if df1.col(“name”)== df2.col(“name”) and df1.col(“starttime”) is greater than df2.col(“starttime”). the first part of the condition is ok, I use “equal” method of the column class in spark sql, but for the “greater than” condition, when I use the following syntax in java”: It does not work, It seems “gt”

NoSuchMethodError in shapeless seen only in Spark

I am trying to write a Spark connector to pull AVRO messages off a RabbitMQ message queue. When decoding the AVRO messages, there is a NoSuchMethodError error that occurs only when running in Spark. I could not reproduce the Spark code exactly outside of spark, but I believe the two examples are sufficiently similar. I think this is the smallest

How convert JavaRDD to JavaRDD<List>?

I try make it with use this code, but I get WrappedArray How make it correctly? Answer You can use getList method: where lemmas is the name of the column with lemmatized text. If there is only one column (it looks like this is the case) you can skip select. If you know the index of the column you can

Spark (JAVA) – dataframe groupBy with multiple aggregations?

I’m trying to write a groupBy on Spark with JAVA. In SQL this would look like But what is the Spark/JAVA style equivalent of this query? Let’s say the variable table is a dataframe, to see the relation to the SQL query. I’m thinking something like: Which is obviously incorrect, since you can’t use aggregate functions like .count or .max

Advertisement