I have tried below code to get Map
values via spark column in java but getting null
value expecting exact value from Map as per key search.
and Spark Dataset contains one column and name is KEY
and dataset name dataset1
values in dataset :
KEY 1 2
Java Code –
Map<String,string> map1 = new HashMap<>(); map1.put("1","CUST1"); map1.put("2","CUST2"); dataset1.withColumn("ABCD", functions.lit(map1.get(col("KEY"))));
Current Output is:
ABCD (Column name) null null
Expected Output :
ABCD (Column name) CUST1 CUST2
please me get this expected output.
Advertisement
Answer
The reason why you get this output is pretty simple. The get
function in java can take any object as input. If that object is not in the map, the result is null.
The lit
function in spark is used to create a single value column (all rows have the same value). e.g. lit(1)
creates a column that takes the value 1 for each row.
Here, map1.get(col("KEY"))
(that is executed on the driver), asks map1
the value corresponding to a column object (not the value inside the column, the java/scala object representing the column). The map does not contain that object so the result is null. Therefore, you could as well write lit(null)
. This is why you get a null result inside your dataset.
To solve your problem, you could wrap your map access within a UDF for instance. Something like:
UserDefinedFunction map_udf = udf(new UDF1<String, String>() { @Override public String call(String x) { return map1.get(x); } }, DataTypes.StringType ); spark.udf().register("map_udf", map_udf); result.withColumn("ABCD", expr("map_udf(KEY)"));