Skip to content
Advertisement

Tag: google-cloud-dataflow

Apache Beam Split to Multiple Pipeline Output

I am subscribing from one topic and contains different event types and they pass in with different attributes. After I read the element, based on their attribute, I need to move them to different places. This is the sample code look like: Basically I read an event and filter them on their attribute and write the file. The job failed

Apache Beam BigqueryIO.Write getSuccessfulInserts not working

We are creating a simple apache beam pipeline which inserts data into a bigquery table and We are trying to get the tableRows which have been successfully Inserted into the table and tableRows which are errored, the code is as shown in the screenshot According to the following documentation: https://beam.apache.org/releases/javadoc/2.33.0/org/apache/beam/sdk/io/gcp/bigquery/WriteResult.html BigQueryIO.writeTableRows() returns a WriteResult object which has getSuccessfulInserts() which will

How to avoid warning message when read BigQuery data to custom data type: Can’t verify serialized elements of type BoundedSource

I defined a custom data type reference the document here. https://github.com/apache/beam/blob/master/examples/java/src/main/java/org/apache/beam/examples/snippets/Snippets.java#L127 And read data from BigQuery using the code below. https://github.com/apache/beam/blob/master/examples/java/src/main/java/org/apache/beam/examples/snippets/Snippets.java#L375 Warning message: Can’t verify serialized elements of type BoundedSource have well defined equals method. This may produce incorrect results on some PipelineRunner. This message occurs at step TriggerIdCreation/Read(CreateSource)/Read(CreateSource)/Read(BoundedToUnboundedSourceAdapter)/StripIds.out0 I tried to add equals() method to the custom data type

How to trigger Cloud Dataflow pipeline job from Cloud Function in Java?

I have a requirement to trigger the Cloud Dataflow pipeline from Cloud Functions. But the Cloud function must be written in Java. So the Trigger for Cloud Function is Google Cloud Storage’s Finalise/Create Event, i.e., when a file is uploaded in a GCS bucket, the Cloud Function must trigger the Cloud dataflow. When I create a dataflow pipeline (batch) and

Dataflow: string to pubsub message

I’m trying to make unit testing in Dataflow. For that test, at the begging, I will start with a simple hardcoded string. The problem is that I would need to transform that string to a pubsub message. I got the following code to do that: But I get the following error: How I should create the pubsub message from the

Sending PubSub message manually in Dataflow

How can I send a PubSub message manually (that is to say, without using a PubsubIO) in Dataflow ? Importing (via Maven) google-cloud-dataflow-java-sdk-all 2.5.0 already imports a version of com.google.pubsub.v1 for which I was unable to find an easy way to send messages to a Pubsub topic (this version doesn’t, for instance, allow to manipulate Publisher instances, which is the

Advertisement