Summary
Integration tests intermittently hang when the metadata/bootstrap connection (broker -1) times out with Receive timeout after 30000ms. This prevents the producer from resolving partition leaders for newly created topics, causing tests to hang until the 5-minute test timeout.
Symptoms
[Error] Dekaf.Networking.KafkaConnection: Receive timeout after 30000ms on broker -1 - marking connection as failed
[Information] Dekaf.Networking.ConnectionPool: All connections closed appearing during test setup
- Tests that create multi-partition topics and produce to them hang indefinitely
- Affected tests:
MultiPartition_SeekOnSpecificPartition_WorksCorrectly, MultiPartition_DifferentKeys_DistributeAcrossPartitions, others in the Messaging category
Context
Observed in CI runs on fix/flaky-tests-round2 branch after the producer batch lifecycle bugs were fixed. The "No response for partition" topic mismatch is now resolved — this is a separate issue.
Possible causes
- Bootstrap connection not being re-established after failure
- Connection pool disposal from a parallel test affecting metadata lookups
- Kafka container under load from parallel tests causing slow metadata responses
- Missing retry/reconnect logic for broker -1 metadata connections
CI references
- Run 23491170091 (commit 720e76d): 1/51 failed —
MultiPartition_SeekOnSpecificPartition_WorksCorrectly
- Run 23490015213 (commit 0030eca): 1/51 failed —
MultiPartition_DifferentKeys_DistributeAcrossPartitions
Summary
Integration tests intermittently hang when the metadata/bootstrap connection (broker -1) times out with
Receive timeout after 30000ms. This prevents the producer from resolving partition leaders for newly created topics, causing tests to hang until the 5-minute test timeout.Symptoms
[Error] Dekaf.Networking.KafkaConnection: Receive timeout after 30000ms on broker -1 - marking connection as failed[Information] Dekaf.Networking.ConnectionPool: All connections closedappearing during test setupMultiPartition_SeekOnSpecificPartition_WorksCorrectly,MultiPartition_DifferentKeys_DistributeAcrossPartitions, others in the Messaging categoryContext
Observed in CI runs on
fix/flaky-tests-round2branch after the producer batch lifecycle bugs were fixed. The "No response for partition" topic mismatch is now resolved — this is a separate issue.Possible causes
CI references
MultiPartition_SeekOnSpecificPartition_WorksCorrectlyMultiPartition_DifferentKeys_DistributeAcrossPartitions