Skip to content
Advertisement

ClassNotFoundException: Failed to find data source: bigquery

I’m trying to load data from Google BigQuery into Spark running on Google Dataproc (I’m using Java). I tried to follow instructions on here: https://cloud.google.com/dataproc/docs/tutorials/bigquery-connector-spark-example

I get the error: “ClassNotFoundException: Failed to find data source: bigquery.”

My pom.xml looks like this:

JavaScript

After adding the dependency to my pom.xml it was downloading a lot to build the .jar, so I think I should have the correct dependency? However, Eclipse is also warning me that “The import com.google.cloud.spark.bigquery is never used”.

This is the part of my code where I get the error:

JavaScript

Advertisement

Answer

I think you only added BQ connector as compile time dependency, but it is missing at runtime. You need to either make a uber jar which includes the connector in your job jar (the doc needs to be updated), or include it when you submit the job gcloud dataproc jobs submit spark --properties spark.jars.packages=com.google.cloud.spark:spark-bigquery_2.11:0.9.1-beta.

User contributions licensed under: CC BY-SA
4 People found this is helpful
Advertisement