Skip to content

Tag: apache-beam

Reading XML with namespace using Apache Beam XmlIO

I am trying to read an XML file into an Apache Beam pipeline. Some elements have namespaces and the namespace declaration is declared at the root node. I am able to parse the xml outside of Apache Beam using the standard JAXB parser. However, when I use XmlIO.read() function with beam I get the following exce…

Dataflow: string to pubsub message

I’m trying to make unit testing in Dataflow. For that test, at the begging, I will start with a simple hardcoded string. The problem is that I would need to transform that string to a pubsub message. I got the following code to do that: But I get the following error: How I should create the pubsub messa…

Beam – Error while branching PCollections

I have a pipeline that reads data from kafka. It splits the incoming data into processing and rejected outputs. Data from Kafka is read into custom class MyData and output is produced as KV<byte[], byte[]> Define two TupleTags with MyData. InvalidDataDoFn has application logic that splits MyData data in…

Apache Beam framework – sort in descending order

How do you sort in descending order using the Apache Beam framework? I managed to create a word count pipeline which sorts alphabetically the output by word, but did not figure out how to invert the sorting order. Here is the code: This code produces a list of words sorted alphabetically with its respective c…