I’m tring to migrate a scala spark job from hadoop cluster to GCP, I have this snippest of code that read a file and create an ArrayBuffer[String] This code runs in the cluster and gives me 3025000 chars, I tried to run this code in dataproc: it gives 3175025 chars, I think there is whitespaces added to file contents or
Tag: google-cloud-dataproc
Issue with Spark Big Query Connector with Java
Getting Below issue with the Spark Big Query connector in Dataproc cluster with below configuraton. Image: 1.5.21-debian10 Spark Version: 2.4.7 Scala Version: 2.12.10 This is working fine from local but failing when I deploy this in dataproc cluster.Can someone suggest some pointers for this issue? pom.xml: Here is the sample Code: Answer Can you please replace the Spark BigQuery connector
ClassNotFoundException: Failed to find data source: bigquery
I’m trying to load data from Google BigQuery into Spark running on Google Dataproc (I’m using Java). I tried to follow instructions on here: https://cloud.google.com/dataproc/docs/tutorials/bigquery-connector-spark-example I get the error: “ClassNotFoundException: Failed to find data source: bigquery.” My pom.xml looks like this: After adding the dependency to my pom.xml it was downloading a lot to build the .jar, so I think