Suppose we have multiple data streams and they share some common features. For example, we have a stream of Teacher and a stream of Student, and they both have an age field. If I want to find out the eldest student or teacher from the realtime stream, I can implement an operator as below. To find out the eldest Teacher,
Tag: apache-flink
Flink Avro Serialization shows “not serializable” error when working with GenericRecords
I’m really having a hard time making Flink to communicate properly with a running Kafka instance making use of an Avro schema from the Confluent Schema Registry (for both key and value). After a while of thinking and restructuring my programm, I was able to push my implementation so far: Producer Method GenericSerializer.java However, when I execute the Job, it
TimeCharacteristics & TimerService in Apache Flink
I’m currently working through this tutorial on Stream processing in Apache Flink and am a little confused on how the TimeCharacteristics of a StreamEnvironment effect the order of the data values in the stream and in respect to which time an onTimer function of a ProcessFunction is called. In the tutorial, they set the characteristics to EventTime, since we want
Does Flink provide Java API to submit jobs to JobManager?
I know Jobs can be submit to JobManager by flink or flink.bat. I wanna know whether Flink provides Java API to submit jobs to JobManager? Answer Yes. Depending on the type of cluster you want to connect, there are several implementations of the ClusterClient (https://ci.apache.org/projects/flink/flink-docs-release-1.3/api/java/org/apache/flink/client/program/ClusterClient.html). It can run jobs either in a blocking (synchronous) or detached (asynchronous) fashion. One way
InvalidTypesException for Generic Pattern Transformation in Apache Flink
I have problem regarding Apache Flink. I want to have an abstract class that consumes a stream. However the pattern applied to this stream should be interchangeable. TEventType marks the type of the event that is generated from the pattern. Every pattern must implement an interface called IEventPattern The abstract class has a method called applyPatternSelectToStream() The flink compiler always
What’s the usage of the file named “conf/masters” in Flink?
Since we can specify a master by “jobmanager.rpc.address” in “flink-conf.yaml” ,what’s the usage of the file named”conf/masters”? Answer It is used for starting standalone cluster with HA mode. You can check out more here
Apache Flink – custom java options are not recognized inside job
I’ve added the following line to flink-conf.yaml: env.java.opts: “-Ddy.props.path=/PATH/TO/PROPS/FILE” when starting jobmanager (jobmanager.sh start cluster) I see in logs that the jvm option is indeed recognized but when I run a flink job (flink run -d PROG.JAR), System.getProperty(“dy.props.path”) returns null (and when printing the system properties, I see that it is indeed absent.) The question really is – how do
Non-parallel data source to ParallelDataSource in flink
I want to transform a non-parallel data source to a parallel data source in Apache Flink. In pseudocode, it would be something like this: I got it done by implementing a noop map function but I suppose there are more elegant ways. Thanks Answer You can use ParallelSourceFunction instead of SourceFunction as interface to be implemented in CustomDataSource. See: https://ci.apache.org/projects/flink/flink-docs-master/api/java/org/apache/flink/streaming/api/functions/source/ParallelSourceFunction.html
apache flink 0.10 how to get the first occurence of a composite key from an unbounded input dataStream?
I am a newbie with apache flink. I have an unbound data stream in my input (fed into flink 0.10 via kakfa). I want to get the 1st occurence of each primary key (the primary key is the contract_num and the event_dt). These “duplicates” occur nearly immediately after each other. The source system cannot filter this for me, so flink