Skip to content

How do I apply multiple columns in GroupBy/PartitionBy in Spark Java API

If I have a list/Seq of columns in Scala like:

val partitionsColumns = "p1,p2"
val partitionsColumnsList = partitionsColumns.split(",").toList

I can easily use it in partitionBy or groupBy like

val windowFunction = Window.partitionBy(partitionsColumnsList:_*)

But if I want to do the same thing in Spark Java API what should I do?

List<String> partitions = new ArrayList<>();

WindowSpec windowSpec  = Window.partitionBy(.....)



partitionBy has two signatures:

partitionBy(Seq<Column> cols)
partitionBy(String colName, Seq<String> colNames)

So you may choose between one of the two. Let’s say that partitions is a list of String. It would go like this:

import scala.collection.JavaConversions;
import scala.collection.Seq;

List<Column> columns =
Seq<Column> columnSeq = JavaConversions.asScalaBuffer(columns).toSeq();
WindowSpec windowSpec  = Window.partitionBy(columnSeq);

// OR
Seq<String> columnSeq2 = JavaConversions.asScalaBuffer(partitions).toSeq();
WindowSpec windowSpec  = Window
    .partitionBy(partitions.get(0), columnSeq2.tail().toSeq());