Skip to content
Advertisement

Flink, rule of using ‘object reuse mode’

Doc says this mode can cause bugs, but it does not tell me the rule of using this mode, in what case it will cause bugs? Let’s say I have a job,

  1. source: kafka (byte[] data),
  2. flat-map: parse byte[] to Google Protobuf object ‘foo’, create a Tuple2<>(foo.id, foo), and return this tuple2
  3. keyby and process: for each id, put the first foo into ValueState, update ValueState if there are multiple object with same id. Emit the first foo(updated) after 10 seconds.

In this case, is it OK to turn on ‘object reuse mode’?

Advertisement

Answer

For the pipeline you have described, yes, object reuse can be safely enabled.

Object reuse only applies to situations where data is forwarded between operator instances within the same task — so in your case, between the source and flatmap. The keyBy forces ser/de and a network shuffle, so object reuse cannot be used between the flatmap and process function. But object reuse would probably also apply between the process function and sink (which I assume is present).

With object reuse enabled, is it NOT safe to

  • remember input object references across function calls or
  • modify input objects

If you avoid those two points, you may safely

  • modify an output object and emit it again

By the way, it would be preferable to implement your deserialization in a DeserializationSchema or KafkaDeserializationSchema, rather than in a flatmap, in which case object reuse would be irrelevant for that part of your pipeline.

User contributions licensed under: CC BY-SA
8 People found this is helpful
Advertisement