Tag: apache-beam

Apache Beam library upgrade causing IllegalStateExceptions with setRowSchema and setCoder

apache-beam google-cloud-dataflow java maven

I’m attempting to upgrade the Apache Beam libraries from v2.19.0 to v2.37.0 (Java 8 & Maven), but have run into an issue with a breaking change that I’d appreciate some support with. Sorry this is quite a long one, I wanted to capture as much context as I could, but please shout if there’s anything you’d like to dig into.

Apache Beam Split to Multiple Pipeline Output

apache-beam google-cloud-dataflow java

I am subscribing from one topic and contains different event types and they pass in with different attributes. After I read the element, based on their attribute, I need to move them to different places. This is the sample code look like: Basically I read an event and filter them on their attribute and write the file. The job failed

How to extract information from PCollection after a join in apache beam?

apache-beam java

I have two example streams of data on which I perform innerJoin. I would like to extend this piece of example join code and add some logic after the join occurs I would like to just print the ad name and num clicks after the join using a DoFcn like this: Any ideas on how to extract this info from

Add document to Firestore from Beam with auto generated ID

apache-beam apache-beam-io google-cloud-firestore java

I would like to use Apache Beam Java with the recently published Firestore connector to add new documents to a Firestore collection. While I thought that this should be a relatively easy task, the need for creating com.google.firestore.v1.Document objects seem to make things a bit more difficult. I was using this blog post on Using Firestore and Apache Beam for

Apache Beam BigqueryIO.Write getSuccessfulInserts not working

apache-beam google-bigquery google-cloud-dataflow google-cloud-platform java

We are creating a simple apache beam pipeline which inserts data into a bigquery table and We are trying to get the tableRows which have been successfully Inserted into the table and tableRows which are errored, the code is as shown in the screenshot According to the following documentation: https://beam.apache.org/releases/javadoc/2.33.0/org/apache/beam/sdk/io/gcp/bigquery/WriteResult.html BigQueryIO.writeTableRows() returns a WriteResult object which has getSuccessfulInserts() which will

Best practice to pass large pipeline option in apache beam

apache-beam google-cloud-dataflow java

We have a use case where we want in to pass hundred lines of json spec to our apache beam pipeline.? One straight forward way is to create custom pipeline option as mentioned below. Is there any other way where we can pass the input as file? I want to deploy the pipeline in Google dataflow engine. Even If I

How to append new rows or perform union on tow PCollection

apache-beam apache-beam-internals apache-beam-io java

In the following CSV, I need to append new row values for it. ID date balance 01 31/01/2021 100 01 28/02/2021 200 01 31/03/2021 200 01 30/04/2021 200 01 31/05/2021 500 01 30/06/2021 600 Expected output: ID date balance 01 31/01/2021 100 01 28/02/2021 200 01 31/03/2021 200 01 30/04/2021 200 01 31/05/2021 500 01 30/06/2021 600 01 30/07/2021 999

How do I set the coder for a PCollection<List> in Apache Beam?

apache-beam jackson-databind java json

I’m teaching myself Apache Beam, specifically for using in parsing JSON. I was able to create a simple example that parsed JSON to a POJO and POJO to CSV. It required that I use .setCoder() for my simple POJO class. The problem Now I am trying to skip the POJO step of parsing using some custom transforms. My pipeline looks

Apache Beam Resampling of columns based on date

apache-beam java

I am using ApacheBeam to process data and trying to achieve the following. read the data from CSV file. (Completed ) Group the records based on Customer ID (Completed) Resample the data based on month and calculate the sum for that particular month. Detailed Explanation: I have a CSV file as shown below. customerId date amount BS:89481 11/14/2012 124 BS:89480

Beam PAssert messes up the Row

apache-beam apache-beam-io java

I am exploring testing with Beam and encountered a weird problem. My driver program works as expected, but its test is failing with an error like this: And here is my PAssert code: On the last step of my pipeline, I log the element in question. This is the expected result. When I debugged the test, the problem boiled down