Currently, I’m working on a project which extracts data from a BigQuery table using Scio in Scala.
I’m able to extract and ingest the data into ElasticSearch, but I’m trying to do the same but using an S3 bucket as storage.
Certainly, I’m able to write the data into a txt file using the method saveAsTextFile
, and then upload it from my machine to the s3 bucket adding the correct libraries into sbt.
However, I don’t know if it is possible to write a saveCustomOutput
code to write the data right away to S3, instead of using a local storage.
Advertisement
Answer
It is possible but instead of using an S3 bucket as LZ, I set a Kinesis Data stream. By adding a Kinesis event over a Lambda function, it was possible to stream the data into an S3 bucket