KAFKA-18309: Change log.segment.bytes config type from INT to LONG#21975
KAFKA-18309: Change log.segment.bytes config type from INT to LONG#21975acgtun wants to merge 1 commit intoapache:trunkfrom
Conversation
Change the log.segment.bytes (and its topic-level synonym segment.bytes) config type from INT to LONG to allow segment sizes larger than ~2GB (Integer.MAX_VALUE). The INT type limited the maximum configurable segment size to 2,147,483,647 bytes (~1.99 GB). This change lifts that restriction by using LONG type throughout the config definition, storage, and propagation layers. Changes: - LogConfig: Change DEFAULT_SEGMENT_BYTES to long, config definitions from INT to LONG, segmentSize field/accessor from int to long - AbstractKafkaConfig: logSegmentBytes() returns Long via getLong() - RollParams: maxSegmentBytes field from int to long - Cleaner: groupSegmentsBySize() maxSize param from int to long - initFileSize() safely caps at Integer.MAX_VALUE for FileRecords compatibility Backward/forward compatibility: - Backward compatible: existing INT values (stored as strings) parse correctly as LONG - Forward compatible: values <= Integer.MAX_VALUE work on older brokers; values > Integer.MAX_VALUE require the new version - No wire protocol impact (broker/topic config only)
nileshkumar3
left a comment
There was a problem hiding this comment.
The config type change and initFileSize() cap look good. One concern: LogSegment.size() and FileRecords.sizeInBytes() both return int (backed by AtomicInteger), so values above Integer.MAX_VALUE are accepted by the config but can't actually be enforced by the storage layer.
Specifically, shouldRoll() compares int size against long maxSegmentBytes — when maxSegmentBytes > ~2 GB, the int is promoted to long for comparison but can never exceed it, so size-based rolling silently becomes a no-op. Similarly, the batch-size guard in UnifiedLog (validRecords.sizeInBytes() > config().segmentSize()) becomes unreachable since sizeInBytes() returns int.
Consider adding between(1024 * 1024, Integer.MAX_VALUE) as the validator so the type is LONG (future-proof), but the range is honest about what the storage layer supports today. The upper bound can be lifted later when FileRecords and LogSegment are widened to long.
|
@acgtun Please check and fix the JIRA number in the PR title. Also, this change requires a KIP (Kafka Improvement Proposal). Since the current .index file format uses a 4-byte integer for the physical position, it cannot support segments larger than 2GB. A KIP is needed to discuss how to upgrade the index format. |
Thanks @chia7712. If the .index file format only uses 4-bytes and update the index format, this could be much more complicated. |
Motivation
The
log.segment.bytesconfig (and its topic-level synonymsegment.bytes) is currently defined asINTtype, which limits the maximum configurable segment size toInteger.MAX_VALUE(~1.99 GB). This is an artificial limitation for users who want larger log segments.Changes
Change the config type from
INTtoLONGacross the config definition, storage, and propagation layers:LogConfig.javaDEFAULT_SEGMENT_BYTES→long; config defsINT→LONG;segmentSize()returnslong;initFileSize()safely caps atInteger.MAX_VALUEAbstractKafkaConfig.javalogSegmentBytes()→LongviagetLong()RollParams.javamaxSegmentBytes→longCleaner.javagroupSegmentsBySize()maxSizeparam →longLong/longCompatibility
INTvalues (stored/transmitted as strings like"1073741824") parse correctly asLONG.Integer.MAX_VALUEwork on older brokers. Values >Integer.MAX_VALUErequire the new version.Testing
All existing unit and integration tests pass, including
LogConfigTest,KafkaConfigTest,UnifiedLogTest,LocalLogTest,LogCleanerManagerTest,KafkaRaftLogTest,DefaultSupportedConfigCheckerTest, andStaticBrokerConfigTest.