Skip to content
Advertisement

Spark – Transforming Complex Data Types

Goal

The goal I want to achieve is to

  • read a CSV file (OK)
  • encode it to Dataset<Person>, where Person object has a nested object Address[]. (Throws an exception)

The Person CSV file

In a file called person.csv, there is the following data describing some persons:

JavaScript

The first line is the schema and address is a nested structure.

Data classes

The data classes are:

JavaScript

and

JavaScript

Reading untyped Data

I have tried first to read the data from the CSV in a Dataset<Row>, which works as expected:

JavaScript

Encoding through a UserDefinedFunction

My udf that take a Stringand return an Address[]:

JavaScript

The caller:

JavaScript

Which leads to this execption:

Caused by: java.lang.IllegalArgumentException: The value (Address(street=streetA, city=cityA)) of the type (ch.project.data.Address) cannot be converted to struct

Why it cannot convert from Address to Struct?

Advertisement

Answer

After trying a lot of different ways and spending some hours researching over the Internet, I have the following conclusions:

UserDefinedFunction is good but are from the old world, it can be replaced by a simple map() function where we need to transform object from one type to another. The simplest way is the following

JavaScript
User contributions licensed under: CC BY-SA
4 People found this is helpful
Advertisement