Skip to content
Advertisement

Flink error with ‘The implementation of the StreamExecutionEnvironment is not serializable’

I am a beginner of Flink, and I tried to use Flink to run LFM, one of the recommended algorithms, but the following errors appeared in my code when it was running. I tried to find and modify them, but they were not solved. Could someone tell me why I had problems?

Here are my main exception

JavaScript

And my code here

Noted that sourceDataStream is from my custom source extends RichFunction<Tuple3<>>

JavaScript

Advertisement

Answer

There are a few things about this that aren’t going to work:

In the implementation of any user function (such as a ProcessFunction) you cannot have a DataStream, or a Table, or a StreamExecutionEnvironment, or another ProcessFunction. All you can do is react to an incoming stream record, optionally using state you have built up inside that function based on the previously processed records.

The DataStream and Table APIs are organized around a builder paradigm, with which you are describing a streaming dataflow pipeline. This pipeline must be a directed acyclic graph: it can split and merge but must flow from sources to sinks without any loops. The stages of that pipeline (e.g., a ProcessFunction) must be coded as independent blocks — they cannot reach outside of themselves to access data from other pipeline stages.

This paradigm isn’t well suited for the purpose of training machine learning models (since training involves iterating/looping). If that’s your objective, maybe take a look at https://github.com/apache/flink-ml.

User contributions licensed under: CC BY-SA
1 People found this is helpful
Advertisement