Skip to content
Advertisement

It is possible to Stream data from beam (Scio) to an S3 bucket?

Currently, I’m working on a project which extracts data from a BigQuery table using Scio in Scala.

I’m able to extract and ingest the data into ElasticSearch, but I’m trying to do the same but using an S3 bucket as storage.

Certainly, I’m able to write the data into a txt file using the method saveAsTextFile, and then upload it from my machine to the s3 bucket adding the correct libraries into sbt.

However, I don’t know if it is possible to write a saveCustomOutput code to write the data right away to S3, instead of using a local storage.

Advertisement

Answer

It is possible but instead of using an S3 bucket as LZ, I set a Kinesis Data stream. By adding a Kinesis event over a Lambda function, it was possible to stream the data into an S3 bucket

User contributions licensed under: CC BY-SA
6 People found this is helpful
Advertisement