Apache Beam Split to Multiple Pipeline Output

Question

I am subscribing from one topic and contains different event types and they pass in with different attributes. After I read the element, based on their attribute, I need to move them to different places. This is the sample code look like: Basically I read an event and filter them on their attribute and write the file. The job failed

Accepted Answer

The two EventIO transforms on the same subscription are the cause of the error. You need to eliminate one of those transforms in order for this to work. This can be done by consuming the subscription into a single PCollection and then applying two filtering branches to that collection individually. Here is a partial example:// single PCollection of the events consumed from the subscriptionPCollection events = pipeline .apply("Read Events", EventIO.readJsons() .of(T.class) .withPubsubTimestampAttributeName(null) .withOptions(options));// PCollection of type1 eventsPCollection typeOneEvents = events.apply( Filter.by( new SerializableFunction() { @Override public Boolean apply(T input) { return input.attributes.get("type").equals("type1"); }}));// TODO typeOneEvents.apply("WindowMetrics / AsJsons / Write File(s)")// PCollection of type2 eventsPCollection typeTwoEvents = events.apply( Filter.by( new SerializableFunction() { @Override public Boolean apply(T input) { return input.attributes.get("type").equals("type2"); }}));// TODO typeTwoEvents.apply("WindowMetrics / AsJsons / Write File(s)")Another possibility is to use some other transforms provided by Apache Beam. Doing so might simplify your solution a little. Once such transform is Partition. Partition allows for the splitting of a single PCollection in a fixed number of PCollections based on a partitioning function. A partial example using Partition is:// single PCollection of the events consumed from the subscriptionPCollectionList eventsByType = pipeline .apply("Read Events", EventIO.readJsons() .of(T.class) .withPubsubTimestampAttributeName(null) .withOptions(options)) .apply("Partition By Type", Partition.of(2, new PartitionFn() { public int partitionFor(T event, int numPartitions) { return input.attributes.get("type").equals("type1") ? 0 : 1; }}));PCollection typeOneEvents = eventsByType.get(0);// TODO typeOneEvents.apply("WindowMetrics / AsJsons / Write File(s)")PCollection typeTwoEvents = eventsByType.get(1);// TODO typeTwoEvents.apply("WindowMetrics / AsJsons / Write File(s)")

Advertisement

Answer