Spark SASL not working on the emr with yarn

Tags: , , ,

So first, I want to say the only thing I have seen address this issue is here: Spark 1.6.1 SASL. However, when adding the configuration for the spark and yarn authentication, it is still not working. Below is my configuration for spark using spark-submit on a yarn cluster on amazon’s emr:

    SparkConf sparkConf = new SparkConf().setAppName("secure-test");
    sparkConf.set("spark.authenticate.enableSaslEncryption", "true");
    sparkConf.set("", "true");
    sparkConf.set("spark.authenticate", "true");
    sparkConf.set("spark.serializer", "org.apache.spark.serializer.KryoSerializer");
    sparkConf.set("spark.kryo.registrator", "org.nd4j.Nd4jRegistrator");
    try {
        sparkConf.registerKryoClasses(new Class<?>[]{
    } catch (Exception e) {}

    sparkContext = new JavaSparkContext(sparkConf);
    sparkContext.hadoopConfiguration().set("fs.s3a.impl", "org.apache.hadoop.fs.s3a.S3AFileSystem");
    sparkContext.hadoopConfiguration().set("fs.s3a.enableServerSideEncryption", "true");
    sparkContext.hadoopConfiguration().set("spark.authenticate", "true");

Note, I added the spark.authenticate to the sparkContext’s hadoop configuration in code instead of the core-site.xml (which I am assuming I can do that since other things work as well).

Looking here: it seems like both spark.authenticate’s are necessary. When I run this application, I get the following stack trace.

17/01/03 22:10:23 INFO storage.BlockManager: Registering executor with local external shuffle service. 17/01/03 22:10:23 ERROR client.TransportClientFactory: Exception while bootstrapping client after 178 ms java.lang.RuntimeException: java.lang.IllegalArgumentException: Unknown message type: -22 at$Decoder.fromByteBuffer( at at at at at at at at at io.netty.handler.timeout.IdleStateHandler.channelRead( at at at io.netty.handler.codec.MessageToMessageDecoder.channelRead( at at at at at at at$ at at at at at io.netty.util.concurrent.SingleThreadEventExecutor$ at

In Spark’s docs, it says

For Spark on YARN deployments, configuring spark.authenticate to true will automatically handle generating and distributing the shared secret. Each application will use a unique shared secret.

which seems wrong based on the comments in the yarn file above, but with trouble shooting, I am still lost on where I should go to get sasl to work? Am I missing something obvious that is documented somewhere?


So I finally figured it out. The previous StackOverflow thread was technically correct. I needed to add the spark.authenticate to the yarn configuration. Maybe it is possible to do this, but I can’t figure out how to add this configuration in the code, which makes sense at a high level why this is the case. I will post my configuration below in case anyone else runs into this issue in the future.

First, I used an aws emr configurations file (An example of this is when using aws cli aws emr create-cluster --configurations file://youpathhere.json)

Then, I added the following json to the file:

    "Classification": "spark-defaults",
    "Properties": {
        "spark.authenticate": "true",
        "spark.authenticate.enableSaslEncryption": "true",
        "": "true"
    "Classification": "core-site",
    "Properties": {
        "spark.authenticate": "true"

Source: stackoverflow