Skip to content
Advertisement

Understanding Number of StreamProcessor instances created and do stream task share same streamprocessor instance?

I want to understand a little more details on the relationship between StreamThread, StreamTask and how many instances of StreamProcessor is created when we have:

  • a source kafka topic with multiple partitions , say 6.
  • I am keeping only ONE StreamThread (num.stream.threads=1)

I am keeping a simple processor topology:

source_topic –> Processor1 –> Processor2 –> Processo3 –> sink_topic

Each processor simply forwards to next processor in chain. Snippet of one of the processors. I am using low level Java API.

JavaScript

Snippet of Main driver application:

JavaScript

With this arrangement, I have following questions:

  • How many instances of processors (Processor1, Processor2, Processor3) will be created?
  • As per my understanding , there will be SIX stream tasks. Is a new instance of processor created for each Stream task or they “share” the same Processor instance?
  • When a Stream Thread is created, does it create a new instance of processor?
  • Are Stream Tasks created as part of Stream Threads creation?

(New question added to original list)

  • In this scenario a single stream thread will have SIX stream tasks. Does a stream thread execute these stream tasks one-by-one, sort of “in-a-loop”. Do stream tasks run as a separate “thread”. Basically, not able to understand how a single stream thread run multiple stream tasks at the same time/in parallel?

The below is topology which gets printed:

JavaScript

Advertisement

Answer

How many instances of processors (Processor1, Processor2, Processor3) will be created?

In your example, six each. Each task will instantiate a full copy of the Topology. (cf. https://github.com/apache/kafka/blob/2.4/streams/src/main/java/org/apache/kafka/streams/processor/internals/StreamThread.java#L355; note: a Topology is a the logical representation of the program, and is instantiated asProcessorTopology at runtime)

As per my understanding, there will be SIX stream tasks. Is a new instance of processor created for each Stream task or they “share” the same Processor instance?

Each task has its own Processor instance — they are not shared.

When a Stream Thread is created, does it create a new instance of processor?

No. When a task is created, it will create new Processor instances.

Are Stream Tasks created as part of Stream Threads creation?

No. Tasks are create during a rebalance according to the partition/task assignment. KafkaStreams registers a StreamsRebalanceListener on its internal cosumner that call TaskManager#createTasks()

Update (as question was extended):

In this scenario a single stream thread will have SIX stream tasks. Does a stream thread execute these stream tasks one-by-one, sort of “in-a-loop”. Do stream tasks run as a separate “thread”. Basically, not able to understand how a single stream thread run multiple stream tasks at the same time/parallely?

Yes, the StreamsThread will execute the tasks in a loop. There are no other threads. Hence, tasks that are assigned to the same thread are not executed at the same time/in-parallel but one after each other.(Cf. https://github.com/apache/kafka/blob/2.4/streams/src/main/java/org/apache/kafka/streams/processor/internals/AssignedStreamsTasks.java#L472 — each StreamThread used exactly one TaskManager that uses AssignedStreamsTasks and AssignedStandbyTasks internally.)

Advertisement