Skip to content
Advertisement

Non-parallel data source to ParallelDataSource in flink

I want to transform a non-parallel data source to a parallel data source in Apache Flink. In pseudocode, it would be something like this:

int partitions = env.getParallelim();

DataSource<String> input = new CustomDataSource<String>();
DataSource<String> parallel = input.setParallelism(partitions).suffle();

I got it done by implementing a noop map function but I suppose there are more elegant ways.

Thanks

Advertisement

Answer

You can use ParallelSourceFunction instead of SourceFunction as interface to be implemented in CustomDataSource.

See: https://ci.apache.org/projects/flink/flink-docs-master/api/java/org/apache/flink/streaming/api/functions/source/ParallelSourceFunction.html

Advertisement