Using Debezium 0.7 to read from MySQL but getting flush timeout and OutOfMemoryError errors in the initial snapshot phase. Looking at the logs below it seems like the connector is trying to write too many messages in one go:
WorkerSourceTask{id=accounts-connector-0} flushing 143706 outstanding messages for offset commit [org.apache.kafka.connect.runtime.WorkerSourceTask] WorkerSourceTask{id=accounts-connector-0} Committing offsets [org.apache.kafka.connect.runtime.WorkerSourceTask] Exception in thread "RMI TCP Connection(idle)" java.lang.OutOfMemoryError: Java heap space WorkerSourceTask{id=accounts-connector-0} Failed to flush, timed out while waiting for producer to flush outstanding 143706 messages [org.apache.kafka.connect.runtime.WorkerSourceTask]
Wonder what the correct settings are http://debezium.io/docs/connectors/mysql/#connector-properties for sizeable databases (>50GB). I didn’t have this issue with smaller databases. Simply increasing the timeout doesn’t seem like a good strategy. I’m currently using the default connector settings.
Update
Changed the settings as suggested below and it fixed the problem:
OFFSET_FLUSH_TIMEOUT_MS: 60000 # default 5000 OFFSET_FLUSH_INTERVAL_MS: 15000 # default 60000 MAX_BATCH_SIZE: 32768 # default 2048 MAX_QUEUE_SIZE: 131072 # default 8192 HEAP_OPTS: '-Xms2g -Xmx2g' # default '-Xms1g -Xmx1g'
Advertisement
Answer
This is a very complex question – first of all, the default memory settings for Debezium Docker images are quite low so if you are using them it might be necessary to increase them.
Next, there are multiple factors at play. I recommend to do follwoing steps.
- Increase
max.batch.size
andmax.queue.size
– reduces number of commits - Increase
offset.flush.timeout.ms
– gives Connect time to process accumulated records - Decrease
offset.flush.interval.ms
– should reduce the amount of accumulated offsets
Unfortunately there is an issue KAFKA-6551 lurking in backstage that can still play a havoc.