Read a file from google storage in dataproc

Question

I&#8217;m tring to migrate a scala spark job from hadoop cluster to GCP, I have this snippest of code that read a file and create an ArrayBuffer[String] This code runs in the cluster and gives me 3025000 chars, I tried to run this code in dataproc: it gives 3175025 chars, I think there is whitespaces added to…

Accepted Answer

I don&#8217;t found a solution using buffer so I tried to read char by char and it&#8217;s work for me:var i = 0var r=0val response = new StringBuilderwhile ( ({r=sourceEDR.read(); r} != -1)) {  val ch= r.asInstanceOf[Char]  if(response.length < 300) { response.append(ch)}  else {  val str = response.toString().replaceAll("[
]", " ")    i += str.length    outputEDRFile += (str + "n");    response.setLength(0)    response.append(ch)  }}val str = response.toString().replaceAll("[
]", " ")i += str.lengthoutputEDRFile += (str + "n");

Advertisement

Answer