Skip to content
Advertisement

Read a file from google storage in dataproc

I’m tring to migrate a scala spark job from hadoop cluster to GCP, I have this snippest of code that read a file and create an ArrayBuffer[String]

JavaScript

This code runs in the cluster and gives me 3025000 chars, I tried to run this code in dataproc:

JavaScript

it gives 3175025 chars, I think there is whitespaces added to file contents or I must use another interface to read the file from google storage in dataproc ? Also I tried other encoding option but it give same results. Any Help ?

Advertisement

Answer

I don’t found a solution using buffer so I tried to read char by char and it’s work for me:

JavaScript
User contributions licensed under: CC BY-SA
3 People found this is helpful
Advertisement