Skip to content
Advertisement

Apache Spark: StackOverflowError when trying to indexing string columns

I have csv file with about 5000 rows and 950 columns. First I load it to DataFrame:

JavaScript

After that I search all string columns

JavaScript

and want to index them. For that I create indexers for each string column

JavaScript

and create pipeline

JavaScript

But when I try to transform my initial dataframe with this pipeline

JavaScript

I get StackOverflowError

JavaScript

What am I doing wrong? Thanks.

Advertisement

Answer

Seems like I found the kind of solution – use spark 2.0. Previously, I used 1.6.2 – it was the latest version at the time of issue. I tried to use the preview version of 2.0, but there is also the problem reproduced.

User contributions licensed under: CC BY-SA
6 People found this is helpful
Advertisement