Suppose we have multiple data streams and they share some common features. For example, we have a stream of Teacher and a stream of Student, and they both have an age field. If I want to find out the eldest student or teacher from the realtime stream, I can implement an operator as below. To find out the eldest Teacher,
Tag: flink-streaming
What are the other options to handle skew data in Flink?
I am studying data skew processing in Flink and how I can change the low-level control of physical partition in order to have an even processing of tuples. I have created synthetic skewed data sources and I aim to process (aggregate) them over a window. Here is the complete code. According to the Flink dashboard I could not see too
TimeCharacteristics & TimerService in Apache Flink
I’m currently working through this tutorial on Stream processing in Apache Flink and am a little confused on how the TimeCharacteristics of a StreamEnvironment effect the order of the data values in the stream and in respect to which time an onTimer function of a ProcessFunction is called. In the tutorial, they set the characteristics to EventTime, since we want
Does Flink provide Java API to submit jobs to JobManager?
I know Jobs can be submit to JobManager by flink or flink.bat. I wanna know whether Flink provides Java API to submit jobs to JobManager? Answer Yes. Depending on the type of cluster you want to connect, there are several implementations of the ClusterClient (https://ci.apache.org/projects/flink/flink-docs-release-1.3/api/java/org/apache/flink/client/program/ClusterClient.html). It can run jobs either in a blocking (synchronous) or detached (asynchronous) fashion. One way
Apache Flink – custom java options are not recognized inside job
I’ve added the following line to flink-conf.yaml: env.java.opts: “-Ddy.props.path=/PATH/TO/PROPS/FILE” when starting jobmanager (jobmanager.sh start cluster) I see in logs that the jvm option is indeed recognized but when I run a flink job (flink run -d PROG.JAR), System.getProperty(“dy.props.path”) returns null (and when printing the system properties, I see that it is indeed absent.) The question really is – how do
apache flink 0.10 how to get the first occurence of a composite key from an unbounded input dataStream?
I am a newbie with apache flink. I have an unbound data stream in my input (fed into flink 0.10 via kakfa). I want to get the 1st occurence of each primary key (the primary key is the contract_num and the event_dt). These “duplicates” occur nearly immediately after each other. The source system cannot filter this for me, so flink