Skip to content
Advertisement

Developing job for Flink

I am building a simple data pipeline for learning purposes. I have real time data coming from Kafka, I would like to do some transformations using Flink.

Unfortunately, I’m not sure if I understand correctly deployment options. In the the Flink docs I have found section about Docker Compose and application mode. It says that I can deploy only one job to the Flink:

A Flink Application cluster is a dedicated cluster which runs a single job. In this case, you deploy the cluster with the job as one step, thus, there is no extra job submission needed.
The job artifacts are included into the class path of Flinkā€™s JVM process within the container and consist of:

  • your job jar, which you would normally submit to a Session cluster and
  • all other necessary dependencies or resources, not included into Flink.

To deploy a cluster for a single job with Docker, you need to

  • make job artifacts available locally in all containers under /opt/flink/usrlib,
  • start a JobManager container in the Application cluster mode
  • start the required number of TaskManager containers.

On the other hand, I found examples on github using flink-java artifact, without running any docker image.

What is the difference and why the second option in not mentioned in Flink docs?

And, is it possible to deploy Flink job as a separate docker image?

Advertisement

Answer

I suggest you take a look at Demystifying Flink Deployments.

https://nightlies.apache.org/flink/flink-docs-stable/docs/deployment/overview/ also gives a good overview.

If you’re interested in setting up a standalone cluster (without Docker or Kubernetes or YARN), see https://nightlies.apache.org/flink/flink-docs-stable/docs/deployment/resource-providers/standalone/overview/.

And, is it possible to deploy Flink job as a separate docker image?

I’m not sure how to interpret this question. Are you asking if the Flink client can run in a separate image from the Flink cluster that runs the job? You could dockerize a session cluster and submit a job into that cluster from outside it. You’ll find an example of that in https://github.com/apache/flink-playgrounds/blob/master/operations-playground/docker-compose.yaml. (That operations playground is a good resource, btw.)

Another approach builds a single image that can be run as either a job manager or a task manager, with the flink client and all of its dependencies built into that image. This approach is described in https://nightlies.apache.org/flink/flink-docs-stable/docs/deployment/resource-providers/native_kubernetes/#application-mode.

It’s worth noting that a lot of folks aren’t doing any of this directly, and are instead relying on platforms that manage containerized Flink deployments at a higher level.

User contributions licensed under: CC BY-SA
4 People found this is helpful
Advertisement