I’m trying to use Apache Spark/Ignite integration in Azure Databricks. I install the org.apache.ignite:ignite-spark-2.4:2.9.0 maven library using the Databricks UI. And I have an error while accessing my ignite cahces:
: java.lang.NoSuchMethodError: org.springframework.util.ReflectionUtils.clearCache()V at org.springframework.context.support.AbstractApplicationContext.resetCommonCaches(AbstractApplicationContext.java:907) at org.springframework.context.support.AbstractApplicationContext.refresh(AbstractApplicationContext.java:567)
Here the AbstractApplicationContext
is compiled with ReflectionUtils
of different spring version.
I see the spring-core-4.3.26.RELEASE.jar is installed in the /dbfs/FileStore/jars/maven/org/springframework during the org.apache.ignite:ignite-spark-2.4:2.9.0 installation and there are no other spring version jars under the /dbfs/FileStore/jars
But it seems the databricks internally uses spring-core__4.1.4.
%sh ls /databricks/jars | grep spring
prints:
spark--maven-trees--spark_2.4--com.clearspring.analytics--stream--com.clearspring.analytics__stream__2.7.0.jar spark--maven-trees--spark_2.4--org.springframework--spring-core--org.springframework__spring-core__4.1.4.RELEASE.jar spark--maven-trees--spark_2.4--org.springframework--spring-test--org.springframework__spring-test__4.1.4.RELEASE.jar
I’m not a java programmer, so I’m not experienced to solve this kind of conflicts.
Is it possible to adjust the databricks classpath somehow or solve this problem some other way?
It may be very easy to adjust the classpath, but I don’t know how. I just see in the databricks documentation a remark that it’s possible to change the classpath in init-script. I can create an init-script, have done that before, but what exactly should I do in it?
I’ve tried different databricks runtime versions and I try to use the 6.6 at the moment. I think Apache Ignite has no integration with the spark 3.
Advertisement
Answer
Following the link https://kb.databricks.com/libraries/replace-default-jar-new-jar.html I created the init script like this:
dbutils.fs.mkdirs("dbfs:/databricks/scripts/") dbutils.fs.put("dbfs:/databricks/scripts/install_spring.sh", """ rm -rf /databricks/jars/spark--maven-trees--spark_2.4--com.h2database--h2--com.h2database__h2__1.3.174.jar rm -rf /databricks/jars/spark--maven-trees--spark_2.4--org.springframework--spring-core--org.springframework__spring-core__4.1.4.RELEASE.jar rm -rf /databricks/jars/spark--maven-trees--spark_2.4--org.springframework--spring-test--org.springframework__spring-test__4.1.4.RELEASE.jar cp /dbfs/FileStore/jars/maven/com/h2database/h2-1.4.197.jar /databricks/jars/ cp /dbfs/FileStore/jars/maven/org/springframework/spring-core-4.3.26.RELEASE.jar /databricks/jars/ cp /dbfs/FileStore/jars/maven/org/springframework/spring-test-4.3.26.RELEASE.jar /databricks/jars/ """, True)
After that I registered this init script on the cluster and the ignite integration worked for me (org.apache.ignite:ignite-spark-2.4:2.9.0, ignite 2.9.0, azure databricks 6.6)
There are about 500 jar files preinstalled under /databricks/jars and it’s possible I’ve broken some dependencies, but have not notice some side effects for my task.