Skip to content

[server] Gate vtp protocol-schema header behind VtpHeaderEmissionMode#2798

Open
sushantmane wants to merge 2 commits into
linkedin:mainfrom
sushantmane:sumane/skip-vtp-on-heartbeat-sos
Open

[server] Gate vtp protocol-schema header behind VtpHeaderEmissionMode#2798
sushantmane wants to merge 2 commits into
linkedin:mainfrom
sushantmane:sumane/skip-vtp-on-heartbeat-sos

Conversation

@sushantmane
Copy link
Copy Markdown
Contributor

Summary

VeniceWriter attaches the vtp protocol-schema header (the Avro JSON for
KafkaMessageEnvelope, ~16 KB) on the first message of every segment
(segmentNumber == 0 && messageSequenceNumber == 0). Heartbeat control messages
are constructed as START_OF_SEGMENT with both numbers at zero, so under the
pre-existing rule every heartbeat picks up the ~16 KB header even though
heartbeat consumers never use it for schema bootstrap.

On busy ingestion paths with many partitions and frequent heartbeats this
dominates the consumer-side per-record memory footprint: a small data record
can share an in-flight queue with thousands of ~16 KB heartbeat envelopes, and
the queue size becomes a function of vtp retention rather than payload size.

This PR introduces a writer-scoped enum, VtpHeaderEmissionMode, surfaced via
the new config key venice.writer.vtp.header.emission.mode:

Mode Behavior
SOS_AND_HB (default) Emit on every segment-start, including heartbeats. Preserves pre-existing behavior — nothing changes on upgrade.
SOS_ONLY Emit on regular data segment-start records only; skip heartbeat SOS. Consumers must resolve the KafkaMessageEnvelope schema by some other means (earlier data SOS on the same segment, or an out-of-band schema cache).
NONE Never emit. Use only when all consumers can resolve the schema without the per-segment hint.

Changes

  • New enum com.linkedin.venice.writer.VtpHeaderEmissionMode documenting the
    three modes.
  • New config key VENICE_WRITER_VTP_HEADER_EMISSION_MODE
    (venice.writer.vtp.header.emission.mode) in ConfigKeys.
  • VeniceWriter parses the property in its constructor; unrecognized values
    fall back to SOS_AND_HB with a warning log, matching the rest of the
    writer's tolerant config parsing.
  • VeniceWriter#getHeaders gains a boolean isHeartbeat parameter so the
    call site can distinguish heartbeat SOS from data SOS. The three existing
    invocations (one in the regular send path, two in sendHeartbeat) are
    updated in place.
  • The pre-existing gate on protocolSchemaHeader != null is unchanged.

Testing Done

  • testHeartbeatVtpEmissionMode (TestNG DataProvider over all three modes)
    verifies the vtp header is present on the heartbeat under SOS_AND_HB and
    absent under SOS_ONLY / NONE.
  • testDataSosWithVtpEmissionModeNone verifies that NONE also drops the
    vtp header on regular data segment-start records — not just heartbeats —
    so consumers can rely on the documented semantics.
  • Full VeniceWriterUnitTest class passes (regression check for the new
    isHeartbeat parameter threading).
> Task :internal:venice-common:test
com.linkedin.venice.writer.VeniceWriterUnitTest > testDataSosWithVtpEmissionModeNone PASSED (523 ms)
com.linkedin.venice.writer.VeniceWriterUnitTest > testHeartbeatVtpEmissionMode[0](SOS_AND_HB, true) PASSED (122 ms)
com.linkedin.venice.writer.VeniceWriterUnitTest > testHeartbeatVtpEmissionMode[1](SOS_ONLY, false) PASSED (3 ms)
com.linkedin.venice.writer.VeniceWriterUnitTest > testHeartbeatVtpEmissionMode[2](NONE, false) PASSED (3 ms)
BUILD SUCCESSFUL in 9s

Backward Compatibility

Default is SOS_AND_HB, which preserves the pre-existing rule
(isFirstMessageOfFirstSegment). Existing deployments see no behavior change
unless they explicitly opt into SOS_ONLY or NONE.

…ssionMode

VeniceWriter attaches the vtp (KafkaMessageEnvelope schema) header whenever
the message is the first record of the first segment
(segmentNumber=0 && messageSequenceNumber=0). Heartbeat control messages are
constructed as START_OF_SEGMENT with the same coordinates, so every heartbeat
picks up the ~16 KB schema blob even though heartbeat consumers do not need it
for schema bootstrap.

On write paths with many partitions and frequent heartbeats this dominates the
consumer-side per-record memory footprint: a single ~155 B data record can
share a queue with thousands of ~16 KB heartbeat envelopes, and the in-flight
queue size becomes a function of vtp retention rather than payload size.

This change introduces a writer-scoped enum, VtpHeaderEmissionMode, surfaced
via the new config key venice.writer.vtp.header.emission.mode:

  - SOS_AND_HB (default): emit on every segment-start including heartbeats.
    Preserves pre-existing behavior; nothing changes on upgrade.
  - SOS_ONLY: emit on regular data SOS records only; skip heartbeat SOS.
    Consumers must obtain the KafkaMessageEnvelope schema from an earlier
    data SOS or an out-of-band schema cache.
  - NONE: never emit. Use only when all consumers can resolve the schema
    without the per-segment hint.

Implementation notes:

  - VeniceWriter#getHeaders gains a boolean isHeartbeat parameter so the call
    site can distinguish heartbeat SOS from data SOS; the existing three
    invocations are updated in place.
  - Unrecognized config values fall back to SOS_AND_HB with a warning log,
    matching the rest of the writer's tolerant config parsing.
  - The vtp header is also gated on protocolSchemaHeader != null, unchanged
    from pre-existing behavior.

Testing done:

  - testHeartbeatVtpEmissionMode (DataProvider over all three modes):
    confirms the vtp header is present on the heartbeat under SOS_AND_HB and
    absent under SOS_ONLY / NONE.
  - testDataSosWithVtpEmissionModeNone: confirms NONE drops the vtp header
    on regular data segment-start records as well, not just heartbeats.
  - Full VeniceWriterUnitTest class passes (regression check for the new
    isHeartbeat parameter threading).
Copilot AI review requested due to automatic review settings May 14, 2026 20:44
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds a writer-scoped configuration to control whether VeniceWriter emits the vtp (transport protocol schema) header on segment-start messages, allowing operators to avoid attaching the ~16KB KafkaMessageEnvelope schema blob to heartbeat SOS records.

Changes:

  • Introduces VtpHeaderEmissionMode (SOS_AND_HB default, SOS_ONLY, NONE) and wires it into VeniceWriter header construction (including a new isHeartbeat parameter).
  • Adds config key venice.writer.vtp.header.emission.mode to control the emission mode (tolerant parsing with warn-and-default behavior).
  • Adds unit tests verifying heartbeat behavior across modes and that NONE suppresses vtp on data SOS as well.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.

File Description
internal/venice-common/src/test/java/com/linkedin/venice/writer/VeniceWriterUnitTest.java Adds tests validating vtp header emission on heartbeats and data SOS under different modes.
internal/venice-common/src/main/java/com/linkedin/venice/writer/VtpHeaderEmissionMode.java New enum defining emission modes and documenting intended semantics.
internal/venice-common/src/main/java/com/linkedin/venice/writer/VeniceWriter.java Parses the new config and gates vtp header attachment based on mode + heartbeat vs data SOS.
internal/venice-common/src/main/java/com/linkedin/venice/ConfigKeys.java Adds the new config key and Javadoc describing how to use it.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread internal/venice-common/src/main/java/com/linkedin/venice/writer/VeniceWriter.java Outdated
Comment thread internal/venice-common/src/main/java/com/linkedin/venice/ConfigKeys.java Outdated
@sushantmane sushantmane changed the title [writer] Gate vtp protocol-schema header behind VtpHeaderEmissionMode (SOS_AND_HB | SOS_ONLY | NONE) [server] Gate vtp protocol-schema header behind VtpHeaderEmissionMode (SOS_AND_HB | SOS_ONLY | NONE) May 14, 2026
Address Copilot review nits on linkedin#2798: the docs said "first message of every
segment" but the actual gate is segmentNumber == 0 && messageSequenceNumber
== 0,
which on the data path only matches the very first segment-start per partition
(segment 0, sequence 0). Heartbeats pin both coordinates to 0 in
getHeartbeatKME,
so every heartbeat matches the gate. Reworded all three locations
(VtpHeaderEmissionMode Javadoc, the VeniceWriter constructor comment, and the
ConfigKeys Javadoc on VENICE_WRITER_VTP_HEADER_EMISSION_MODE) to describe the
gate precisely and call out the data-path vs heartbeat asymmetry.

No behavior change.
@sushantmane sushantmane changed the title [server] Gate vtp protocol-schema header behind VtpHeaderEmissionMode (SOS_AND_HB | SOS_ONLY | NONE) [server] Gate vtp protocol-schema header behind VtpHeaderEmissionMode May 14, 2026
@sushantmane sushantmane added the needs-reviewer Looking for a reviewer to pick this up label May 15, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

needs-reviewer Looking for a reviewer to pick this up

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants