Spark (JAVA) – dataframe groupBy with multiple aggregations?

Question

I'm trying to write a groupBy on Spark with JAVA. In SQL this would look like But what is the Spark/JAVA style equivalent of this query? Let's say the variable table is a dataframe, to see the relation to the SQL query. I'm thinking something like: Which is obviously incorrect, since you can't use aggregate functions like .count or .max

Accepted Answer

You could do this with org.apache.spark.sql.functions:import org.apache.spark.sql.functions;table.groupBy("id").agg(    functions.count("id").as("count"),    functions.max("date").as("maxdate")).show();

Advertisement

Answer