According to the release notes, and specifically the ticket Build and Run Spark on Java 17 (SPARK-33772), Spark now supports running on Java 17.
However, using Java 17 (Temurin-17.0.3+7) with Maven (3.8.6) and maven-surefire-plugin (3.0.0-M7), when running a unit test that uses Spark (3.3.0) it fails with:
java.lang.IllegalAccessError: class org.apache.spark.storage.StorageUtils$ (in unnamed module @0x1e7ba8d9) cannot access class sun.nio.ch.DirectBuffer (in module java.base) because module java.base does not export sun.nio.ch to unnamed module @0x1e7ba8d9
The stack is:
java.lang.IllegalAccessError: class org.apache.spark.storage.StorageUtils$ (in unnamed module @0x1e7ba8d9) cannot access class sun.nio.ch.DirectBuffer (in module java.base) because module java.base does not export sun.nio.ch to unnamed module @0x1e7ba8d9 at org.apache.spark.storage.StorageUtils$.<init>(StorageUtils.scala:213) at org.apache.spark.storage.StorageUtils$.<clinit>(StorageUtils.scala) at org.apache.spark.storage.BlockManagerMasterEndpoint.<init>(BlockManagerMasterEndpoint.scala:114) at org.apache.spark.SparkEnv$.$anonfun$create$9(SparkEnv.scala:353) at org.apache.spark.SparkEnv$.registerOrLookupEndpoint$1(SparkEnv.scala:290) at org.apache.spark.SparkEnv$.create(SparkEnv.scala:339) at org.apache.spark.SparkEnv$.createDriverEnv(SparkEnv.scala:194) at org.apache.spark.SparkContext.createSparkEnv(SparkContext.scala:279) at org.apache.spark.SparkContext.<init>(SparkContext.scala:464) at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2704) at org.apache.spark.sql.SparkSession$Builder.$anonfun$getOrCreate$2(SparkSession.scala:953) at scala.Option.getOrElse(Option.scala:189) at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:947) [...]
The question Java 17 solution for Spark – java.lang.NoClassDefFoundError: Could not initialize class org.apache.spark.storage.StorageUtils was asked only 2 months ago, but this pre-dated the Spark 3.3.0 release, and thus predated official support for Java 17.
Why can’t I run my Spark 3.3.0 test with Java 17, and how can we fix it?
Advertisement
Answer
Even though Spark now supports Java 17, it still references the JDK internal class sun.nio.ch.DirectBuffer
:
// In Java 8, the type of DirectBuffer.cleaner() was sun.misc.Cleaner, and it was possible // to access the method sun.misc.Cleaner.clean() to invoke it. The type changed to // jdk.internal.ref.Cleaner in later JDKs, and the .clean() method is not accessible even with // reflection. However sun.misc.Unsafe added a invokeCleaner() method in JDK 9+ and this is // still accessible with reflection. private val bufferCleaner: DirectBuffer => Unit = [...]
Under the Java module system, access to this class is restricted. The Java 9 migration guide says:
If you must use an internal API that has been made inaccessible by default, then you can break encapsulation using the –add-exports command-line option.
We need to open access to our module. To do this for Surefire, we add this configuration to the plugin:
<plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-surefire-plugin</artifactId> <version>3.0.0-M7</version> <configuration> <argLine>--add-exports java.base/sun.nio.ch=ALL-UNNAMED</argLine> </configuration> </plugin>
Based on a discussion with one of the Spark developers, Spark adds the following in order to execute all of its internal unit tests.
These options are used to pass all Spark UTs, but maybe you don’t need all.
--add-opens=java.base/java.lang=ALL-UNNAMED --add-opens=java.base/java.lang.invoke=ALL-UNNAMED --add-opens=java.base/java.lang.reflect=ALL-UNNAMED --add-opens=java.base/java.io=ALL-UNNAMED --add-opens=java.base/java.net=ALL-UNNAMED --add-opens=java.base/java.nio=ALL-UNNAMED --add-opens=java.base/java.util=ALL-UNNAMED --add-opens=java.base/java.util.concurrent=ALL-UNNAMED --add-opens=java.base/java.util.concurrent.atomic=ALL-UNNAMED --add-opens=java.base/sun.nio.ch=ALL-UNNAMED --add-opens=java.base/sun.nio.cs=ALL-UNNAMED --add-opens=java.base/sun.security.action=ALL-UNNAMED --add-opens=java.base/sun.util.calendar=ALL-UNNAMED
It was also commented:
However, these Options needn’t explicit add when using spark-shell, spark-sql and spark-submit