Skip to content
Advertisement

Apache Beam Split to Multiple Pipeline Output

I am subscribing from one topic and contains different event types and they pass in with different attributes.

After I read the element, based on their attribute, I need to move them to different places. This is the sample code look like:

JavaScript

Basically I read an event and filter them on their attribute and write the file. The job failed in dataflow as Workflow failed. Causes: The pubsub configuration contains errors: Subscription 'sub-name' is consumed by multiple stages, this will result in undefined behavior.

So what will be the appropriate way to split the pipeline within the same job?

I tried Pipeline1, Pipeline2,Pipeline3 and it end up need to multiple job name to run multiple pipeline, I am not sure that should the right way to do it.

Advertisement

Answer

The two EventIO transforms on the same subscription are the cause of the error. You need to eliminate one of those transforms in order for this to work. This can be done by consuming the subscription into a single PCollection and then applying two filtering branches to that collection individually. Here is a partial example:

JavaScript

Another possibility is to use some other transforms provided by Apache Beam. Doing so might simplify your solution a little. Once such transform is Partition. Partition allows for the splitting of a single PCollection in a fixed number of PCollections based on a partitioning function. A partial example using Partition is:

JavaScript
User contributions licensed under: CC BY-SA
2 People found this is helpful
Advertisement