I’m using Pulsar for communication between services and I’m experiencing flakiness in a quite simple test of producers and consumers.
In JUnit 4 test, I spin up (my own wrappers around) a ZooKeeper server, a BookKeeper bookie, and a
PulsarService; the configurations should be quite standard.
The test can be summarized in the following steps:
- build a producer;
- build a consumer (say, a reader of a Pulsar topic);
- check the message backlog (using precise backlog);
- build a new producer and synchronously send a message onto the topic;
- build a new consumer and read the messages on the topic;
- I expect a backlog of one message, and I actually read one
- build a new producer and synchronously send four messages;
- fetch again the messages, using the messageID read at step 5 as start message ID;
- I expect a backlog of four messages here, and most of the time this value is correct, but running the test about ten times I consistently get 2 or 5
I tried debugging the test, but I cannot figure out where those values come from; did I misunderstand something?
Things you can try if not already done:
- Ask for precise backlog measurement. By default, it’s only estimated as getting the precise measurement is a costlier operation. Use
admin.topics().getStats(topic, true)for this. (See https://github.com/apache/pulsar/blob/724523f3051def9577d6bd27697866c99f4a7b0e/pulsar-client-admin-api/src/main/java/org/apache/pulsar/client/admin/Topics.java#L862)
- Deactivate batching on the producer side. The number returned in
msgBacklogis the number of entries so multiple messages batched in a single entry will count as 1. See relevant issue : https://github.com/apache/pulsar/issues/7623. It can explain why you see a value of 2 for the
msgBacklogif the 4 messages have been put in the same batch. Beware that deactivating batching can have a huge impact on performance.