Skip to content
Advertisement

How to extract information from PCollection after a join in apache beam?

I have two example streams of data on which I perform innerJoin. I would like to extend this piece of example join code and add some logic after the join occurs

JavaScript

I would like to just print the ad name and num clicks after the join using a DoFcn like this:

JavaScript

Any ideas on how to extract this info from the joined data?

Advertisement

Answer

As you learned, the Schema Join method emulates the SQL join in which the result of the join is the concatenation of the rows from the joined PCollections. In order to see which rows went into the inner join you have to use the CoGroup utility to join the PCollections. This returns a Row object with individual iterables for each of the PCollections that contains Rows that match the key. Example:

JavaScript
User contributions licensed under: CC BY-SA
5 People found this is helpful
Advertisement