Skip to content
Advertisement

Read values from Java Map using Spark Column using java

I have tried below code to get Map values via spark column in java but getting null value expecting exact value from Map as per key search.

and Spark Dataset contains one column and name is KEY and dataset name dataset1

values in dataset :

KEY
1
2 

Java Code –

Map<String,string> map1 = new HashMap<>();
map1.put("1","CUST1");
map1.put("2","CUST2");


dataset1.withColumn("ABCD", functions.lit(map1.get(col("KEY"))));

Current Output is:

ABCD (Column name)
null
null

Expected Output :

ABCD (Column name)
CUST1
CUST2

please me get this expected output.

Advertisement

Answer

The reason why you get this output is pretty simple. The get function in java can take any object as input. If that object is not in the map, the result is null.

The lit function in spark is used to create a single value column (all rows have the same value). e.g. lit(1) creates a column that takes the value 1 for each row.

Here, map1.get(col("KEY")) (that is executed on the driver), asks map1 the value corresponding to a column object (not the value inside the column, the java/scala object representing the column). The map does not contain that object so the result is null. Therefore, you could as well write lit(null). This is why you get a null result inside your dataset.

To solve your problem, you could wrap your map access within a UDF for instance. Something like:

UserDefinedFunction map_udf = udf(new UDF1<String, String>() {
            @Override
            public String call(String x) {
                return map1.get(x);
            }
        }, DataTypes.StringType );

spark.udf().register("map_udf", map_udf);
result.withColumn("ABCD", expr("map_udf(KEY)"));
User contributions licensed under: CC BY-SA
4 People found this is helpful
Advertisement