kafka: decouple the max-message-bytes, allow it can be set by adjust kafka configurations#5420
kafka: decouple the max-message-bytes, allow it can be set by adjust kafka configurations#54203AceShowHand wants to merge 3 commits into
Conversation
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (12)
✅ Files skipped from review due to trivial changes (1)
🚧 Files skipped from review as they are similar to previous changes (3)
📝 WalkthroughWalkthroughIntroduces ChangesSeparate batch and producer message size limits
Estimated code review effort🎯 4 (Complex) | ⏱️ ~70 minutes Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Warning Tools execution failed with the following error: Failed to run tools: 13 INTERNAL: Received RST_STREAM with code 2 (Internal server error) Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
/test all |
There was a problem hiding this comment.
Code Review
This pull request decouples the final encoded message size limit (MaxMessageBytes) from the batch splitting and large-message threshold (MaxBatchMessageBytes) across various TiCDC sinks and codecs (including Kafka, Pulsar, Cloud Storage, Avro, Canal JSON, Open Protocol, and Simple). This allows for more granular control over message batching and limits. The feedback suggests simplifying duplicate validation logic in the simple encoder and adding configuration validation to ensure MaxBatchMessageBytes does not exceed MaxMessageBytes.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| if c.MaxBatchMessageBytes < 0 { | ||
| return errors.ErrCodecInvalidConfig.Wrap( | ||
| errors.Errorf("invalid max-batch-message-bytes %d", c.MaxBatchMessageBytes), | ||
| ) | ||
| } |
There was a problem hiding this comment.
We should validate that MaxBatchMessageBytes is not greater than MaxMessageBytes. A batch size threshold larger than the absolute maximum message size is an invalid configuration and could lead to unexpected behavior.
if c.MaxBatchMessageBytes < 0 {
return errors.ErrCodecInvalidConfig.Wrap(
errors.Errorf("invalid max-batch-message-bytes %d", c.MaxBatchMessageBytes),
)
}
if c.MaxBatchMessageBytes > c.MaxMessageBytes {
return errors.ErrCodecInvalidConfig.Wrap(
errors.Errorf("max-batch-message-bytes %d cannot be greater than max-message-bytes %d", c.MaxBatchMessageBytes, c.MaxMessageBytes),
)
}There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@pkg/sink/codec/simple/encoder_test.go`:
- Around line 1595-1611: The TestDMLLargerThanBatchLimit test sets
MaxBatchMessageBytes to 50 but never verifies that the actual encoded message
payload exceeds this threshold, making the test non-deterministic and unable to
guarantee it exercises the intended code path. After calling enc.Build() to
obtain the messages, add an assertion that explicitly checks the payload size of
messages[0] is greater than the MaxBatchMessageBytes limit to ensure the test
fixture is sufficiently large and the behavior remains deterministic.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 5d97615e-8665-4e5f-9988-6942075570a7
📒 Files selected for processing (18)
downstreamadapter/sink/cloudstorage/encoder_group_test.godownstreamadapter/sink/cloudstorage/sink.godownstreamadapter/sink/helper/helper.godownstreamadapter/sink/kafka/helper.godownstreamadapter/sink/kafka/sink_test.godownstreamadapter/sink/pulsar/helper.gopkg/sink/codec/avro/encoder.gopkg/sink/codec/canal/canal_json_encoder.gopkg/sink/codec/canal/canal_json_encoder_test.gopkg/sink/codec/canal/canal_json_txn_encoder.gopkg/sink/codec/common/config.gopkg/sink/codec/open/encoder.gopkg/sink/codec/open/encoder_test.gopkg/sink/codec/simple/encoder.gopkg/sink/codec/simple/encoder_test.gopkg/sink/kafka/options.gopkg/sink/kafka/options_test.gopkg/sink/kafka/sarama_config.go
|
@3AceShowHand: The following test failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
What problem does this PR solve?
Issue Number: close #1405
Kafka sink used
max-message-bytesfor multiple purposes:Because of this coupling, when a single row event is larger than the changefeed
max-message-bytes, operators have to pause the changefeed, increasemax-message-bytes, and also update the downstream Kafka topic or broker message size limit. This is operationally cumbersome. The desired behavior is that once Kafka topic or broker accepts a larger message, TiCDC can use that larger producer limit without requiring a changefeed configuration update.What is changed and how it works?
This PR decouples the batch threshold from the final producer message limit.
max-message-bytesfirst and keeps that value as the batch splitting threshold.max.message.bytesor brokermessage.max.bytes, and caps it below Sarama's request size limit.options.MaxMessageBytesis adjusted after Kafka metadata is read, so it represents the final producer hard limit afteradjustOptions.ConfigusesMaxBatchMessageBytesfor batch splitting decisions andMaxMessageBytesfor the final encoded message hard limit.Result
After this change, when Kafka rejects a message because the downstream topic or broker limit is too small, operators only need to increase Kafka's message size configuration. TiCDC can then pick up the downstream limit as the producer hard limit, without requiring a manual changefeed
max-message-bytesupdate.The original changefeed
max-message-bytesbehavior for batching remains compatible: existing batch splitting behavior is preserved unless Kafka's producer hard limit is smaller, in which case the batch threshold is clamped to the producer limit to avoid forming unsendable batches.Check List
Tests
Questions
Will it cause performance regression or break compatibility?
No expected performance regression. The change is configuration plumbing and encoder-side limit selection. It preserves the existing
max-message-bytessemantics for batch splitting, while allowing Kafka's actual producer limit to be derived from downstream Kafka configuration.Do you need to update user documentation, design documentation or monitoring documentation?
No monitoring update is needed. User-facing documentation may be updated separately to clarify that Kafka topic/broker message size limits can be increased without changing the changefeed
max-message-bytessetting.Release note
Summary by CodeRabbit
Release Notes
New Features
Bug Fixes
Tests