Spark Dataset Foreach function does not iterate

Question

Context I want to iterate over a Spark Dataset and update a HashMap for each row. Here is the code I have: Issue My issue is that the foreach doesn't iterate at all, the lambda is never executed and I don't know why. I implemented it as indicated here: How to traverse/iterate a Dataset in Spark Java? At the end,

Accepted Answer

I am probably a little old school, but I never like lambdas too much, as it can get pretty complicated.Here is a full example of a foreach():package net.jgp.labs.spark.l240_foreach.l000;import java.io.Serializable;import org.apache.spark.api.java.function.ForeachFunction;import org.apache.spark.sql.Dataset;import org.apache.spark.sql.Row;import org.apache.spark.sql.SparkSession;public class ForEachBookApp implements Serializable { private static final long serialVersionUID = -4250231621481140775L; private final class BookPrinter implements ForeachFunction { private static final long serialVersionUID = -3680381094052442862L; @Override public void call(Row r) throws Exception { System.out.println(r.getString(2) + " can be bought at " + r.getString( 4)); } } public static void main(String[] args) { ForEachBookApp app = new ForEachBookApp(); app.start(); } private void start() { SparkSession spark = SparkSession.builder().appName("For Each Book").master( "local").getOrCreate(); String filename = "data/books.csv"; Dataset df = spark.read().format("csv").option("inferSchema", "true") .option("header", "true") .load(filename); df.show(); df.foreach(new BookPrinter()); }}As you can see, this example reads a CSV file and prints a message from the data. It is fairly simple.The foreach() instantiates a new class, where the work is done.df.foreach(new BookPrinter());The work is done in the call() method of the class: private final class BookPrinter implements ForeachFunction { @Override public void call(Row r) throws Exception {... } }As you are new to Java, make sure you have the right signature (for classes and methods) and the right imports.You can also clone the example from https://github.com/jgperrin/net.jgp.labs.spark/tree/master/src/main/java/net/jgp/labs/spark/l240_foreach/l000. This should help you with foreach().

Advertisement

Answer