Adjust classpath / change spring version in azure databricks



I’m trying to use Apache Spark/Ignite integration in Azure Databricks. I install the org.apache.ignite:ignite-spark-2.4:2.9.0 maven library using the Databricks UI. And I have an error while accessing my ignite cahces:

: java.lang.NoSuchMethodError: org.springframework.util.ReflectionUtils.clearCache()V
        at org.springframework.context.support.AbstractApplicationContext.resetCommonCaches(AbstractApplicationContext.java:907)
        at org.springframework.context.support.AbstractApplicationContext.refresh(AbstractApplicationContext.java:567)

Here the AbstractApplicationContext is compiled with ReflectionUtils of different spring version.

I see the spring-core-4.3.26.RELEASE.jar is installed in the /dbfs/FileStore/jars/maven/org/springframework during the org.apache.ignite:ignite-spark-2.4:2.9.0 installation and there are no other spring version jars under the /dbfs/FileStore/jars

But it seems the databricks internally uses spring-core__4.1.4.

%sh
ls /databricks/jars | grep spring

prints:

spark--maven-trees--spark_2.4--com.clearspring.analytics--stream--com.clearspring.analytics__stream__2.7.0.jar
spark--maven-trees--spark_2.4--org.springframework--spring-core--org.springframework__spring-core__4.1.4.RELEASE.jar
spark--maven-trees--spark_2.4--org.springframework--spring-test--org.springframework__spring-test__4.1.4.RELEASE.jar

I’m not a java programmer, so I’m not experienced to solve this kind of conflicts.

Is it possible to adjust the databricks classpath somehow or solve this problem some other way?

It may be very easy to adjust the classpath, but I don’t know how. I just see in the databricks documentation a remark that it’s possible to change the classpath in init-script. I can create an init-script, have done that before, but what exactly should I do in it?

I’ve tried different databricks runtime versions and I try to use the 6.6 at the moment. I think Apache Ignite has no integration with the spark 3.

Answer

Following the link https://kb.databricks.com/libraries/replace-default-jar-new-jar.html I created the init script like this:

dbutils.fs.mkdirs("dbfs:/databricks/scripts/")
dbutils.fs.put("dbfs:/databricks/scripts/install_spring.sh",
"""
rm -rf /databricks/jars/spark--maven-trees--spark_2.4--com.h2database--h2--com.h2database__h2__1.3.174.jar
rm -rf /databricks/jars/spark--maven-trees--spark_2.4--org.springframework--spring-core--org.springframework__spring-core__4.1.4.RELEASE.jar
rm -rf /databricks/jars/spark--maven-trees--spark_2.4--org.springframework--spring-test--org.springframework__spring-test__4.1.4.RELEASE.jar
cp /dbfs/FileStore/jars/maven/com/h2database/h2-1.4.197.jar /databricks/jars/
cp /dbfs/FileStore/jars/maven/org/springframework/spring-core-4.3.26.RELEASE.jar /databricks/jars/
cp /dbfs/FileStore/jars/maven/org/springframework/spring-test-4.3.26.RELEASE.jar /databricks/jars/
""", True)

After that I registered this init script on the cluster and the ignite integration worked for me (org.apache.ignite:ignite-spark-2.4:2.9.0, ignite 2.9.0, azure databricks 6.6)

There are about 500 jar files preinstalled under /databricks/jars and it’s possible I’ve broken some dependencies, but have not notice some side effects for my task.



Source: stackoverflow