Skip to content

Add configurable KVP client API and harden concurrent telemetry writes#286

Open
peytonr18 wants to merge 3 commits intoAzure:mainfrom
peytonr18:probertson-concurrency-kvp-tests
Open

Add configurable KVP client API and harden concurrent telemetry writes#286
peytonr18 wants to merge 3 commits intoAzure:mainfrom
peytonr18:probertson-concurrency-kvp-tests

Conversation

@peytonr18
Copy link
Contributor

@peytonr18 peytonr18 commented Feb 19, 2026

Summary

Expands KVP telemetry beyond the original event-prefix change by introducing a public, external-caller-friendly KVP client API, while strengthening concurrency behavior for startup truncation and documenting the end-to-end design.

This enables multiple emitters/libraries to coexist more safely and gives callers explicit control over KVP emission lifecycle and identity.

Changes

New public KVP client API (kvp.rs, re-exported via logging.rs)

  • Added KvpOptions with builder-style configuration:
    • vm_id
    • event_prefix
    • file_path
    • truncate_on_start
  • Added Kvp::new() (default options path).
  • Added Kvp::with_options(...) for explicit caller configuration.
  • Added direct emission APIs:
    • emit(message, level)
    • emit_health_report(report)
  • Added close().await for graceful shutdown and queue drain.

Internal wiring updates (logging.rs)

  • setup_layers continues to use tracing-layer emission internally.
  • Internal construction now goes through Kvp::new_internal(...) with explicit options-derived inputs.
  • Existing azure-init tracing behavior remains intact.

Concurrency hardening for startup truncation (kvp.rs)

  • Truncation path now uses file locking to avoid check-then-truncate races.
  • Lock contention during truncation is treated as non-fatal (expected in multi-client environments), allowing initialization to continue safely.

Breaking Changes

This PR intentionally changes the public KVP API shape:

  • Removed old subscriber-wrapper usage pattern (Kvp<S>::new(...), .layer(), .halt()).
  • Replaced with direct client API (Kvp::new/with_options, emit, emit_health_report, close).

Goal

  • Make KVP emission straightforward for external callers without requiring tracing_subscriber plumbing.
  • Support concurrent/multi-emitter scenarios more safely.
  • Keep internal azure-init tracing integration while exposing a cleaner public surface.

Resolves #239

Copy link
Contributor

@cadejacobson cadejacobson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks awesome to me! Good work

Increase thread/process write volume to better stress lock safety, and route KVP file opening through Kvp initialization with a private helper to keep open behavior consistent and constrained.
// that two concurrent callers cannot both decide the file is stale and
// truncate data the other has already written.
FileExt::lock_exclusive(&file).map_err(|e| {
anyhow::Error::from(e).context("Failed to lock KVP file for truncation")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if this is expected (that only one client is expected to do the truncation), this should not be an error. We can easily conclude an error situation by seeing all clients failing, but no successful truncation (from the log a few lines below)

@peytonr18 peytonr18 changed the title Support concurrent KVP telemetry with configurable event prefix Add configurable KVP client API and harden concurrent telemetry writes Mar 3, 2026
@peytonr18 peytonr18 force-pushed the probertson-concurrency-kvp-tests branch 2 times, most recently from 5528da4 to d537d8a Compare March 3, 2026 21:40
@peytonr18 peytonr18 force-pushed the probertson-concurrency-kvp-tests branch from d537d8a to c28024a Compare March 3, 2026 21:43
.file_path("/var/lib/hyperv/.kvp_pool_1")
.truncate_on_start(true),
)?;
custom.emit("Custom event payload", None)?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we want to document that KVPs emitted after health report (emit_health_report) are not collected by Azure platform today.

It might be helpful to show a bit more complicated example like emit an in-progress provisioning, call a function foo, then emit a report failure on foo's failure; otherwise emit a success

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[RFE] Support Concurrent KVP Telemetry

4 participants