Tag: apache-spark

Adjust classpath / change spring version in azure databricks

apache-spark azure-databricks ignite java spring

I’m trying to use Apache Spark/Ignite integration in Azure Databricks. I install the org.apache.ignite:ignite-spark-2.4:2.9.0 maven library using the Databricks UI. And I have an error while accessing my ignite cahces: Here the AbstractApplicationContext is compiled with ReflectionUtils of different spring version. I see the spring-core-4.3.26.RELEASE.jar is installed in the /dbfs/FileStore/jars/maven/org/springframework during the org.apache.ignite:ignite-spark-2.4:2.9.0 installation and there are no other spring

How to get corresponding quarter of previous year in Scala

apache-spark date java localdate scala

I have a date string with me in the format – “20202” [“yyyyQ”]. Is there a way to get the corresponding quarter of previous year ? ex- for 20202 , it should be 20192 Answer An alternative to the other answers is using my lib Time4J and its class CalendarQuarter. Example: Two main advantages of this solution are: Calendar quarters

Read data saved by spark redis using Java

apache-spark java redis spark-redis spring-data-redis

I using spark-redis to save Dataset to Redis. Then I read this data by using Spring data redis: This object I save to redis: Save object by using spark-redis: Repository: I can’t read this data have been saved in Redis by using Spring data redis because structure data saved by spark-redis and spring data redis not same (I checked value

How to resolve (java.lang.ClassNotFoundException: com.mongodb.spark.sql.DefaultSource.DefaultSource) in pyspark i’m using pycharm

apache-spark java pyspark python

With Pycharm I’m getting this error: java.lang.ClassNotFoundException: com.mongodb.spark.sql.DefaultSource.DefaultSource How can I resolve this issue? I tried: I also tried setting the classpath of the jars also .bash_profile: I had many jars in my_jars but still didn’t get it to work. I keep getting the same error. Answer Provide comma separated jarfiles instead of directory path in spark.jars Alternatively you can

Apache spark and scala, error while executing queries

apache-spark java scala

I am working with a dataset whose sample is as follows: I have executed the following commands successfully: I am getting following error: java.lang.RuntimeException: Error while encoding: java.lang.RuntimeException: java.lang.Character is not a valid external type for schema of string I am getting the same error when executing any query against the data. Can you please have a look and provide

CSV file from HDFS to Oracle BLOB using Spark

apache-spark java oracle

I’m working on Java app that uses Spark 2.3.1 to load data from Oracle to HDFS and vice versa. I want to create CSV file in HDFS and then load it to Oracle (12.2) BLOB. The code.. I’m new to Spark.. so any ideas please how to convert JavaRDD to BufferedInputStream, or get rid of mess above and put Dataset

spark-submit error: Invalid maximum heap size: -Xmx4g –jars, but enough of memory on the system

apache-spark hail java

I am running a spark job: And the command gives an error: Invalid maximum heap size: -Xmx4g –jars Error: Could not create the Java Virtual Machine. Error: A fatal exception has occurred. Program will exit. I checked memory: So, it seems to be fine. I checked java: Then I checked in Chrome whether spark is running at ai-grisnodedev1:7077 and it does

ClassNotFoundException: Failed to find data source: bigquery

apache-spark google-bigquery google-cloud-dataproc java maven

I’m trying to load data from Google BigQuery into Spark running on Google Dataproc (I’m using Java). I tried to follow instructions on here: https://cloud.google.com/dataproc/docs/tutorials/bigquery-connector-spark-example I get the error: “ClassNotFoundException: Failed to find data source: bigquery.” My pom.xml looks like this: After adding the dependency to my pom.xml it was downloading a lot to build the .jar, so I think

Spark – Transforming Complex Data Types

apache-spark apache-spark-sql java user-defined-functions

Goal The goal I want to achieve is to read a CSV file (OK) encode it to Dataset<Person>, where Person object has a nested object Address[]. (Throws an exception) The Person CSV file In a file called person.csv, there is the following data describing some persons: The first line is the schema and address is a nested structure. Data classes

version conflict, current version [2] is different than the one provided [1]

apache-kafka apache-spark elasticsearch java

I have a Kafka topic and a Spark application. The Spark application gets data from Kafka topic, pre aggregates it and stores it in Elastic Search. Sounds simple, right? Everything works fine as expected, but the minute I set “spark.cores” property something other than 1, I start getting After researching a bit, I think the error is because multiple cores