Skip to content

feat(sources): add gRPC max connection age#25660

Open
fpytloun wants to merge 1 commit into
vectordotdev:masterfrom
fpytloun:fpytloun/grpc-source-max-connection-age
Open

feat(sources): add gRPC max connection age#25660
fpytloun wants to merge 1 commit into
vectordotdev:masterfrom
fpytloun:fpytloun/grpc-source-max-connection-age

Conversation

@fpytloun

@fpytloun fpytloun commented Jun 22, 2026

Copy link
Copy Markdown
Contributor

Summary

Adds optional gRPC server keepalive configuration for source-side gRPC servers:

  • keepalive.max_connection_age_secs
  • keepalive.max_connection_age_grace_secs

The settings are wired into the Vector source and OpenTelemetry gRPC source, sharing the same gRPC server helper. Defaults preserve current behavior.

Closes #19457.

Design note

Tonic 0.11 does not expose native server max_connection_age / grace APIs. I tested a tonic 0.14 upgrade path, but it cascaded into broader hyper, generated-code, and Vector sink transport compatibility work. To keep this PR focused and reviewable, this implementation enforces the source-side gRPC connection lifetime at the accepted IO layer without changing sink behavior or upgrading tonic.

A future tonic upgrade can replace this with native server-side max-age support once the wider tonic/hyper migration is handled separately.

Tests

  • make generate-component-docs
  • cargo fmt --all -- --check
  • cargo test --no-default-features --features sources-vector,sources-opentelemetry --lib config_keepalive
  • cargo test --no-default-features --features sources-vector,sources-opentelemetry --lib config_grpc_keepalive
  • cargo test --no-default-features --features sources-vector --lib sources::vector::test::max_connection_age_closes_idle_connection
  • cargo test --no-default-features --features sources-vector,sinks-vector --lib sources::vector::tests::receive
  • cargo clippy --no-default-features --features sources-vector,sources-opentelemetry --lib -- -D warnings -A clippy::manual_option_zip
  • Docker Compose E2E using a docker-buildx-built PR image with sources-vector,sinks-vector,sources-demo_logs,sinks-console: Vector source configured with keepalive.max_connection_age_secs = 3 received 76 forwarded events over 16 seconds with no client/server errors, confirming delivery continues past repeated server-side connection expiry.

The Clippy allow is for an existing Rust 1.96 lint in unrelated config loading code.

Follow-up validation

After review feedback, the connection-age fallback was tightened so the local lifetime wrapper only closes an expired connection once that connection is idle. The active-request tracking is per accepted connection and is held until the response body is dropped, avoiding a global drain across unrelated connections and avoiding mid-response socket closure.

Additional local validation after the follow-up change:

  • cargo fmt --all -- --check
  • cargo test --no-default-features --features sources-vector --lib sources::util::grpc::tests::max_connection_age_service_tracks
  • cargo test --no-default-features --features sources-vector --lib sources::vector::test::max_connection_age_closes_idle_connection
  • cargo test --no-default-features --features sources-vector,sinks-vector --lib sources::vector::tests::receive
  • cargo test --no-default-features --features sources-vector,sources-opentelemetry --lib config_keepalive
  • cargo test --no-default-features --features sources-vector,sources-opentelemetry --lib config_grpc_keepalive
  • cargo clippy --no-default-features --features sources-vector,sources-opentelemetry --lib -- -D warnings -A clippy::manual_option_zip
  • Docker Compose E2E using a docker-buildx-built PR image with max_connection_age_secs = 3 and max_connection_age_grace_secs = 1:
    • server received 86 forwarded events over 18s
    • reconnects were directly visible in debug logs after idle expiry
    • no Vector ERROR/panic lines were observed

@fpytloun fpytloun requested review from a team as code owners June 22, 2026 10:29
@github-actions github-actions Bot added domain: sources Anything related to the Vector's sources domain: external docs Anything related to Vector's external, public documentation docs review on hold The documentation team reviews PRs only after a PR is approved by the COSE team. labels Jun 22, 2026

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e9ac53ed0a

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread src/sources/util/grpc/mod.rs Outdated
@fpytloun fpytloun changed the title Add gRPC source max connection age config feat(sources): add gRPC max connection age Jun 22, 2026
@fpytloun fpytloun force-pushed the fpytloun/grpc-source-max-connection-age branch from e9ac53e to 04865ba Compare June 22, 2026 12:04
@fpytloun

Copy link
Copy Markdown
Contributor Author

Updated the PR after review feedback.

Changes pushed in 04865ba3c9:

  • tightened the connection-age fallback so expired connections are closed only once the accepted connection is idle
  • made active request tracking per accepted connection instead of global
  • held the active guard through response body lifetime
  • added regression coverage for response-body lifetime and per-connection isolation
  • fixed the PR title to Conventional Commit format

Validation rerun locally: fmt, targeted config/vector tests, targeted clippy, and Docker Compose E2E. The E2E now directly shows reconnects in debug logs after idle expiry while forwarded events continue and no Vector ERROR/panic lines appear.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 04865ba3c9

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread src/sources/util/grpc/mod.rs Outdated
@fpytloun fpytloun force-pushed the fpytloun/grpc-source-max-connection-age branch from 04865ba to 9297bd8 Compare June 22, 2026 13:11
@fpytloun

Copy link
Copy Markdown
Contributor Author

Follow-up update for the remaining review thread:

  • stopped accepting further socket reads after max_connection_age + grace elapses, so new HTTP/2 streams cannot keep an aged connection alive indefinitely
  • still allow writes/flushes while existing response bodies are active, so in-flight responses can drain before the connection fully closes
  • kept active request tracking per accepted connection
  • added regression coverage for:
    • read expiry before write expiry
    • response-body lifetime tracking
    • per-connection active counters
    • vector-source reconnect using observed accepted-connection count growth after deadline

Validation rerun on 9297bd86e8:

  • cargo fmt --all -- --check
  • cargo test --no-default-features --features sources-vector --lib sources::util::grpc::tests::max_connection_age
  • cargo test --no-default-features --features sources-vector --lib sources::vector::test::max_connection_age_closes_idle_connection
  • cargo test --no-default-features --features sources-vector,sinks-vector --lib sources::vector::tests::receive
  • cargo test --no-default-features --features sources-vector,sinks-vector --lib sources::vector::tests::max_connection_age_allows_client_reconnect
  • cargo test --no-default-features --features sources-vector,sources-opentelemetry --lib config_keepalive
  • cargo test --no-default-features --features sources-vector,sources-opentelemetry --lib config_grpc_keepalive
  • cargo clippy --no-default-features --features sources-vector,sources-opentelemetry --lib -- -D warnings -A clippy::manual_option_zip
  • Docker Compose E2E from the PR image: server received 86 forwarded events, client showed 3 client connection bound logs, server showed 4 DEBUG broken-pipe close logs, and there were no Vector ERROR/panic lines.

A delegated code review pass was also clean after the final reconnect coverage was added.

@rtrieu rtrieu self-requested a review June 22, 2026 18:25
@rtrieu rtrieu self-requested a review June 22, 2026 18:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

docs review on hold The documentation team reviews PRs only after a PR is approved by the COSE team. domain: external docs Anything related to Vector's external, public documentation domain: sources Anything related to the Vector's sources

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add max connection age settings to gRPC servers for improved load balancing

2 participants