Skip to content

KAFKA-18309: Change log.segment.bytes config type from INT to LONG#21975

Open
acgtun wants to merge 1 commit intoapache:trunkfrom
acgtun:KAFKA-log-segment-bytes-int-to-long
Open

KAFKA-18309: Change log.segment.bytes config type from INT to LONG#21975
acgtun wants to merge 1 commit intoapache:trunkfrom
acgtun:KAFKA-log-segment-bytes-int-to-long

Conversation

@acgtun
Copy link
Copy Markdown

@acgtun acgtun commented Apr 6, 2026

Motivation

The log.segment.bytes config (and its topic-level synonym segment.bytes) is currently defined as INT type, which limits the maximum configurable segment size to Integer.MAX_VALUE (~1.99 GB). This is an artificial limitation for users who want larger log segments.

Changes

Change the config type from INT to LONG across the config definition, storage, and propagation layers:

File Change
LogConfig.java DEFAULT_SEGMENT_BYTESlong; config defs INTLONG; segmentSize() returns long; initFileSize() safely caps at Integer.MAX_VALUE
AbstractKafkaConfig.java logSegmentBytes()Long via getLong()
RollParams.java maxSegmentByteslong
Cleaner.java groupSegmentsBySize() maxSize param → long
Test utils (Scala + Java) Updated parameter types to Long/long

Compatibility

  • Backward compatible: Existing INT values (stored/transmitted as strings like "1073741824") parse correctly as LONG.
  • Forward compatible: Values ≤ Integer.MAX_VALUE work on older brokers. Values > Integer.MAX_VALUE require the new version.
  • No wire protocol impact — this is a broker/topic-level config only, not part of any Kafka RPC schema.

Testing

All existing unit and integration tests pass, including LogConfigTest, KafkaConfigTest, UnifiedLogTest, LocalLogTest, LogCleanerManagerTest, KafkaRaftLogTest, DefaultSupportedConfigCheckerTest, and StaticBrokerConfigTest.

Change the log.segment.bytes (and its topic-level synonym segment.bytes)
config type from INT to LONG to allow segment sizes larger than ~2GB
(Integer.MAX_VALUE).

The INT type limited the maximum configurable segment size to
2,147,483,647 bytes (~1.99 GB). This change lifts that restriction by
using LONG type throughout the config definition, storage, and
propagation layers.

Changes:
- LogConfig: Change DEFAULT_SEGMENT_BYTES to long, config definitions
  from INT to LONG, segmentSize field/accessor from int to long
- AbstractKafkaConfig: logSegmentBytes() returns Long via getLong()
- RollParams: maxSegmentBytes field from int to long
- Cleaner: groupSegmentsBySize() maxSize param from int to long
- initFileSize() safely caps at Integer.MAX_VALUE for FileRecords
  compatibility

Backward/forward compatibility:
- Backward compatible: existing INT values (stored as strings) parse
  correctly as LONG
- Forward compatible: values <= Integer.MAX_VALUE work on older brokers;
  values > Integer.MAX_VALUE require the new version
- No wire protocol impact (broker/topic config only)
@github-actions github-actions bot added triage PRs from the community core Kafka Broker storage Pull requests that target the storage module small Small PRs labels Apr 6, 2026
Copy link
Copy Markdown
Contributor

@nileshkumar3 nileshkumar3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The config type change and initFileSize() cap look good. One concern: LogSegment.size() and FileRecords.sizeInBytes() both return int (backed by AtomicInteger), so values above Integer.MAX_VALUE are accepted by the config but can't actually be enforced by the storage layer.

Specifically, shouldRoll() compares int size against long maxSegmentBytes — when maxSegmentBytes > ~2 GB, the int is promoted to long for comparison but can never exceed it, so size-based rolling silently becomes a no-op. Similarly, the batch-size guard in UnifiedLog (validRecords.sizeInBytes() > config().segmentSize()) becomes unreachable since sizeInBytes() returns int.

Consider adding between(1024 * 1024, Integer.MAX_VALUE) as the validator so the type is LONG (future-proof), but the range is honest about what the storage layer supports today. The upper bound can be lifted later when FileRecords and LogSegment are widened to long.

@chia7712
Copy link
Copy Markdown
Member

chia7712 commented Apr 7, 2026

@acgtun Please check and fix the JIRA number in the PR title. Also, this change requires a KIP (Kafka Improvement Proposal). Since the current .index file format uses a 4-byte integer for the physical position, it cannot support segments larger than 2GB. A KIP is needed to discuss how to upgrade the index format.

@acgtun
Copy link
Copy Markdown
Author

acgtun commented Apr 7, 2026

@acgtun Please check and fix the JIRA number in the PR title. Also, this change requires a KIP (Kafka Improvement Proposal). Since the current .index file format uses a 4-byte integer for the physical position, it cannot support segments larger than 2GB. A KIP is needed to discuss how to upgrade the index format.

Thanks @chia7712. If the .index file format only uses 4-bytes and update the index format, this could be much more complicated.

@github-actions github-actions bot removed the triage PRs from the community label Apr 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core Kafka Broker small Small PRs storage Pull requests that target the storage module

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants