Rabbit MQ doesn’t flush acks?

The problem appeared in logs: Consumer failed to start in 60000 milliseconds; does the task executor have enough threads to support the container concurrency?

We try to open handlers for like 50 queues dynamically by SimpleMessageListenerContainer.addQueueNames(), then application is started. It consumes some messages, but the RabbitMQ admin panel shows that they are unacked. After a significant amount of time, messages are stacking up to 6 unacked messages (queue has fairly low amount of messages per minute) per queue, which sums up to 300 messages total, something happens and they all become consumed and acked. While messages are unacked, the container seems to be trying to load another consumer until it bumps into the limit.

We rely on AUTO acknowledgment mode now, when it was MANUAL, it was fine.

There are two questions:

What can be the reason for unacked messages? Is there any flushing mechanism that doesn’t trigger often?
What do I do with “not enough threads” message?

Those two seem to be really related one to another.

Here’s the setup:

@Bean
fun queueMessageListenerContainer(
    connectionFactory: ConnectionFactory,
    retryOperationsInterceptor: RetryOperationsInterceptor,
    vehicleQueueListenerFactory: QueueListenerFactory,
): SimpleMessageListenerContainer {
    return SimpleMessageListenerContainer().also {
        it.connectionFactory = connectionFactory
        it.setConsumerTagStrategy { queueName -> consumerTag(queueName) }
        it.setMessageListener(vehicleQueueListenerFactory.create())
        it.setConcurrentConsumers(2)
        it.setMaxConcurrentConsumers(5)
        it.setListenerId("queue-consumer")
        it.setAdviceChain(retryOperationsInterceptor)
        it.setRecoveryInterval(RABBIT_HEARTH_BEAT.toMillis())
        //had 10-100 threads, didn't help
        it.setTaskExecutor(rabbitConsumersExecutorService)
        // AUTO suppose to set ack for the messages, right?
        it.acknowledgeMode = AcknowledgeMode.AUTO
    }
}

@Bean
fun connectionFactory(rabbitProperties: RabbitProperties): AbstractConnectionFactory {
    val rabbitConnectionFactory = com.rabbitmq.client.ConnectionFactory().also { connectionFactory ->
        connectionFactory.isAutomaticRecoveryEnabled = true
        connectionFactory.isTopologyRecoveryEnabled = true
        connectionFactory.networkRecoveryInterval = RABBIT_HEARTH_BEAT.toMillis()
        connectionFactory.requestedHeartbeat = RABBIT_HEARTH_BEAT.toSeconds().toInt()
        // was up to 100 connections, didn't help
        connectionFactory.setSharedExecutor(rabbitConnectionExecutorService)
        connectionFactory.host = rabbitProperties.host
        connectionFactory.port = rabbitProperties.port ?: connectionFactory.port
    }
    return CachingConnectionFactory(rabbitConnectionFactory)
        .also {
            it.cacheMode = rabbitProperties.cache.connection.mode
            it.connectionCacheSize = rabbitProperties.cache.connection.size
            it.setConnectionNameStrategy { "simulation-gateway:${springProfiles.firstOrNull()}:event-consumer" }
        }
}

class QueueListenerFactory { 
  fun create(){
     return MessageListener {
        try {
            // no ack, rely on AUTO acknowledgement mode
            handle()
        } catch (e: Throwable) {
            ...
        }
     }
  }
}

JavaScript
​x
 
@Beanfun queueMessageListenerContainer(    connectionFactory: ConnectionFactory,    retryOperationsInterceptor: RetryOperationsInterceptor,    vehicleQueueListenerFactory: QueueListenerFactory,): SimpleMessageListenerContainer {    return SimpleMessageListenerContainer().also {        it.connectionFactory = connectionFactory        it.setConsumerTagStrategy { queueName -> consumerTag(queueName) }        it.setMessageListener(vehicleQueueListenerFactory.create())        it.setConcurrentConsumers(2)        it.setMaxConcurrentConsumers(5)        it.setListenerId("queue-consumer")        it.setAdviceChain(retryOperationsInterceptor)        it.setRecoveryInterval(RABBIT_HEARTH_BEAT.toMillis())        //had 10-100 threads, didn't help        it.setTaskExecutor(rabbitConsumersExecutorService)        // AUTO suppose to set ack for the messages, right?        it.acknowledgeMode = AcknowledgeMode.AUTO    }}​@Beanfun connectionFactory(rabbitProperties: RabbitProperties): AbstractConnectionFactory {    val rabbitConnectionFactory = com.rabbitmq.client.ConnectionFactory().also { connectionFactory ->        connectionFactory.isAutomaticRecoveryEnabled = true        connectionFactory.isTopologyRecoveryEnabled = true        connectionFactory.networkRecoveryInterval = RABBIT_HEARTH_BEAT.toMillis()        connectionFactory.requestedHeartbeat = RABBIT_HEARTH_BEAT.toSeconds().toInt()        // was up to 100 connections, didn't help        connectionFactory.setSharedExecutor(rabbitConnectionExecutorService)        connectionFactory.host = rabbitProperties.host        connectionFactory.port = rabbitProperties.port ?: connectionFactory.port    }    return CachingConnectionFactory(rabbitConnectionFactory)        .also {            it.cacheMode = rabbitProperties.cache.connection.mode            it.connectionCacheSize = rabbitProperties.cache.connection.size            it.setConnectionNameStrategy { "simulation-gateway:${springProfiles.firstOrNull()}:event-consumer" }        }}​class QueueListenerFactory {   fun create(){     return MessageListener {        try {            // no ack, rely on AUTO acknowledgement mode            handle()        } catch (e: Throwable) {            ...        }     }  }}​

Answer

Okay, I figured out what the problem was. Basically, it couldn’t start all of the queues consumers in time, since it not only is slow process for too slow for SimpleMessageListenerContainer, but also we tried to addQueueNames one by one.

   userRepository.findAll()
        .map { user -> queueName(user) }
        .onEach { queueName ->
            simpleContainerListener.addQueueNames(queueName)
        }

JavaScript
 
   userRepository.findAll()        .map { user -> queueName(user) }        .onEach { queueName ->            simpleContainerListener.addQueueNames(queueName)        }​

But the following line of documentation for SimpleMessageListenerContainer remained unnoticed:

The existing consumers will be cancelled after they have processed any pre-fetched messages and new consumers will be created

JavaScript
 
The existing consumers will be cancelled after they have processed any pre-fetched messages and new consumers will be created ​

Which means what actually happened is recreation of (1, 2, … N) consumers. What made it even worse is that if the request comes from the API, we did exactly the same simpleContainerListener.addQueueNames(queueName) after handling the request, which recreated all of consumers after that.

Also, recreation of the consumers was the reason why AUTO acknowledgement didn’t work: threads were hanging trying to build enough consumers before the timeout.

I fixed this by adding DirectMessageListenerContainer to handle recently added queues, which is blazing fast, compared to SimpleMessageListenerContainer for the particular case of adding just one new consumer.

DirectMessageListenerContainer(connectionFactory).also {
        it.setConsumerTagStrategy { queueName -> consumerTag(queueName, RECENT_CONSUMER_TAG) }
        it.setMessageListener(ListenerFactory.create())
        it.setListenerId("queue-consumer-recent")
        it.setAdviceChain(retryOperationsInterceptor)
        it.setTaskExecutor(recentQueuesTaskExecutor)
        it.acknowledgeMode = AcknowledgeMode.AUTO
    }

JavaScript
 
DirectMessageListenerContainer(connectionFactory).also {        it.setConsumerTagStrategy { queueName -> consumerTag(queueName, RECENT_CONSUMER_TAG) }        it.setMessageListener(ListenerFactory.create())        it.setListenerId("queue-consumer-recent")        it.setAdviceChain(retryOperationsInterceptor)        it.setTaskExecutor(recentQueuesTaskExecutor)        it.acknowledgeMode = AcknowledgeMode.AUTO    }​

The downside is DirectMessageListenerContainer using 1 thread per queue on every instance. This is exactly why I didn’t want to use it in the first place, but using both DirectMessageListenerContainer for recent and SimpleContainerListener for already existing queues significantly reduces amount of thread required to handle those queues. As far as I understand, an overwhelming usage of DirectMessageListenerContainer will lead to OOM eventually, so the next step can be to transfer queues from direct to simple container listener in batches.

Advertisement

Answer