Skip to content

feat(experiments): Denormalize rollup metrics onto Experiment + debounced refresh#424

Open
shanaiabuggy wants to merge 4 commits into
mainfrom
sbuggy/ase-319
Open

feat(experiments): Denormalize rollup metrics onto Experiment + debounced refresh#424
shanaiabuggy wants to merge 4 commits into
mainfrom
sbuggy/ase-319

Conversation

@shanaiabuggy

@shanaiabuggy shanaiabuggy commented Jun 23, 2026

Copy link
Copy Markdown
Contributor

See linear issue description for details: https://linear.app/nvidia/issue/ASE-319/denormalize-rollup-metrics-onto-experiment-debounced-refresh

Summary by CodeRabbit

Summary by CodeRabbit

  • New Features

    • Experiments now expose system-managed cached metrics to improve list/detail responsiveness.
    • Added a background worker that refreshes these cached metrics after experiment ingestion.
    • Introduced a configurable refresh interval to control how quickly cached metrics become up to date.
  • Performance Improvements

    • Experiment listing now conditionally uses cached metrics and falls back to hydration only when cached data can’t be decoded.
  • Tests

    • Added integration and unit coverage for cache decoding, refresh triggering, and refresh worker drain/error scenarios.

…nced refresh

Signed-off-by: shanaiabuggy <59746633+shanaiabuggy@users.noreply.github.com>
Signed-off-by: shanaiabuggy <59746633+shanaiabuggy@users.noreply.github.com>
@shanaiabuggy shanaiabuggy requested review from a team as code owners June 23, 2026 21:31
@github-actions github-actions Bot added the feat label Jun 23, 2026
@coderabbitai

coderabbitai Bot commented Jun 23, 2026

Copy link
Copy Markdown

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 561637e7-a8da-4cbf-b942-d3ff015d2f5b

📥 Commits

Reviewing files that changed from the base of the PR and between 6535de6 and f2b6faa.

📒 Files selected for processing (1)
  • services/intake/tests/integration/spans/test_experiment_metrics_refresh.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • services/intake/tests/integration/spans/test_experiment_metrics_refresh.py

📝 Walkthrough

Walkthrough

Adds a background ExperimentRollupRefresher worker that maintains a coalescing dirty set of (workspace, experiment_id) pairs. Ingest endpoints mark experiments dirty on each span write. The worker periodically drains the set, fetches rollups from ClickHouse, serializes them into a metrics dict on the Experiment entity. list_experiments now serves cached entity.metrics instead of querying ClickHouse per-read.

Changes

Experiment Rollup Background Refresh

Layer / File(s) Summary
Configuration and Experiment entity model
services/intake/src/nmp/intake/config.py, services/intake/src/nmp/intake/entities/experiments.py
Adds rollup_refresh_interval_seconds: float config with default 10.0 and gt=0 constraint; adds optional metrics: dict[str, Any] | None to Experiment marked as system-managed and excluded from create/update bodies.
Rollup serialization: helpers and versioning
services/intake/src/nmp/intake/spans/experiment_rollup_repository.py
Adds METRICS_VERSION = 1; introduces _score_to_dict/_score_from_dict for JSON-safe quantile/count round-trip; adds rollup_to_metrics to serialize ExperimentRollup with version and refreshed_at; adds metrics_to_rollup to deserialize back.
ExperimentRollupRefresher background worker
services/intake/src/nmp/intake/spans/experiment_rollup_refresher.py
New async background worker: coalesces dirty (workspace, experiment_id) pairs via mark_dirty, periodically flush by draining and grouping per workspace, calls get_rollups once per workspace, serializes via rollup_to_metrics, writes experiment.metrics back, re-queues on EntityConflictError/rollup query failure, skips EntityNotFoundError, performs final flush on stop without mid-flush cancellation.
Service lifecycle and FastAPI dependency wiring
services/intake/src/nmp/intake/service.py, services/intake/src/nmp/intake/spans/api/dependencies.py
IntakeService creates and starts ExperimentRollupRefresher on startup (conditional on entity client availability), stops it first during shutdown; get_rollup_refresher reads rollup_refresher from app state, RollupRefresherDep exposes as FastAPI dependency.
Ingest endpoints mark dirty on span write
services/intake/src/nmp/intake/spans/ingest/atif.py, services/intake/src/nmp/intake/spans/ingest/chat_completions.py
Both endpoints accept RollupRefresherDep, compute resolved_evaluation_context() once, pass to validate_experiment_context, conditionally call refresher.mark_dirty(workspace, experiment_id) after ingest when refresher exists and evaluation_id is present.
list_experiments cache-first enrichment
services/intake/src/nmp/intake/api/v2/experiments/endpoints.py
Adds _cached_rollup helper validating METRICS_VERSION and safely decoding entity.metrics; modifies list_experiments to apply cached rollups via _cached_rollup, forwarding only cache-miss experiments to _hydrate_rollups.
Unit and integration tests
services/intake/tests/test_experiment_rollup_refresher.py, services/intake/tests/test_experiment_metrics_cache_gate.py, services/intake/tests/integration/spans/test_experiment_metrics_refresh.py
Unit tests: round-trip metrics conversion, flush batching per workspace, mark_dirty dedup, no-op flush, EntityNotFoundError skip, EntityConflictError re-queue, stop graceful drain, rollup query failure re-queue. Cache-gate tests: _cached_rollup returns None for absent/version-mismatch/malformed blobs. Integration tests: ATIF ingest marks dirty, no-context ingest leaves pending empty.

Sequence Diagram(s)

sequenceDiagram
  participant Client
  participant IngestEndpoint as ingest_atif /<br/> ingest_chat_completion
  participant ExperimentRollupRefresher
  participant ExperimentRollupRepository as ClickHouse
  participant EntityClient as Entity Store

  Client->>IngestEndpoint: POST /ingest (with experiment_context)
  IngestEndpoint->>EntityClient: store span batch
  IngestEndpoint->>ExperimentRollupRefresher: mark_dirty(workspace, experiment_id)

  Note over ExperimentRollupRefresher: background _run loop fires after interval_seconds
  ExperimentRollupRefresher->>ExperimentRollupRefresher: flush() drains _dirty, groups by workspace
  ExperimentRollupRefresher->>ExperimentRollupRepository: get_rollups(workspace, [experiment_ids])
  ExperimentRollupRepository-->>ExperimentRollupRefresher: dict[str, ExperimentRollup]
  loop per rollup
    ExperimentRollupRefresher->>EntityClient: get(Experiment)
    ExperimentRollupRefresher->>ExperimentRollupRefresher: rollup_to_metrics(rollup, refreshed_at)
    ExperimentRollupRefresher->>EntityClient: update(experiment.metrics = metrics_dict)
  end

  Client->>IngestEndpoint: GET /experiments
  IngestEndpoint->>EntityClient: list Experiments
  alt entity.metrics present
    IngestEndpoint->>IngestEndpoint: _cached_rollup(entity.metrics) — no ClickHouse call
  else entity.metrics absent
    IngestEndpoint->>ExperimentRollupRepository: _hydrate_rollups
  end
  IngestEndpoint-->>Client: enriched experiment list
Loading

Possibly related PRs

  • NVIDIA-NeMo/nemo-platform#124: Introduces the v2 experiments API schemas and CRUD endpoints; this PR extends those same experiment entities with cached rollup decoding and background refresh.
  • NVIDIA-NeMo/nemo-platform#154: Introduces ClickHouse-backed _hydrate_rollups path in api/v2/experiments/endpoints.py that this PR now bypasses when entity.metrics is cached.
  • NVIDIA-NeMo/nemo-platform#282: Refactors ExperimentRollupRepository.get_rollups, the method this PR's refresher calls to populate experiment.metrics.

Suggested reviewers

  • BrianNewsom
🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 25.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed Title clearly describes the main changes: denormalizing rollup metrics onto Experiment entities and implementing a debounced refresh mechanism, which aligns with the substantial feature additions across the codebase.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch sbuggy/ase-319

Comment @coderabbitai help to get the list of available commands.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (1)
services/intake/src/nmp/intake/spans/experiment_rollup_refresher.py (1)

106-106: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Use a concrete type for metrics in _write_metrics.

metrics: dict is too broad; use dict[str, Any] for concrete typing.
As per coding guidelines, "**/*.py: Always prefer concrete type hints over string-based ones in Python code; do not import types under TYPE_CHECKING, instead import types as regular imports when possible."

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@services/intake/src/nmp/intake/spans/experiment_rollup_refresher.py` at line
106, In the `_write_metrics` method signature, replace the `metrics: dict`
parameter type annotation with `dict[str, Any]` to provide a concrete type hint.
Import `Any` from the `typing` module as a regular import at the top of the file
(not under TYPE_CHECKING) to support this type annotation.

Source: Coding guidelines

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@services/intake/src/nmp/intake/api/v2/experiments/endpoints.py`:
- Around line 374-378: The call to metrics_to_rollup and _apply_rollup on line
375 can throw an exception if entity.metrics is malformed or incompatible, which
causes the entire list_experiments response to fail for that single bad row.
Wrap the _apply_rollup call in a try-except block, and when an exception is
caught during decoding, treat it as a cache miss by appending the response to
the needs_live_hydrate list just like the else branch does, rather than letting
the exception propagate and fail the entire operation.

In `@services/intake/src/nmp/intake/spans/experiment_rollup_refresher.py`:
- Around line 57-67: The stop method can lose experiment updates if the
cancellation occurs while flush is executing. After batch is detached from
self._dirty at lines 83-84, a cancellation would leave those IDs unwritten and
un-requeued. Instead of immediately cancelling the task after setting the
stopping flag, allow the task loop to naturally exit by checking the
self._stopping flag (which is already set), or modify the flush method to be
cancellation-safe by catching asyncio.CancelledError and re-queuing the detached
batch back to self._dirty before the exception propagates. Either approach
ensures no queued updates are lost during shutdown.

In `@services/intake/src/nmp/intake/spans/experiment_rollup_repository.py`:
- Around line 72-75: The metrics_to_rollup function (lines 92-106) reads stored
metrics without validating the version field that rollup_to_metrics writes (line
80), creating forward/backward compatibility issues. Add a version check at the
beginning of metrics_to_rollup that compares the "version" field in the metrics
payload against METRICS_VERSION; if they do not match, route to a fallback
re-hydration/recompute path instead of proceeding with the blind parsing that
could cause misreads or int() conversion errors.

---

Nitpick comments:
In `@services/intake/src/nmp/intake/spans/experiment_rollup_refresher.py`:
- Line 106: In the `_write_metrics` method signature, replace the `metrics:
dict` parameter type annotation with `dict[str, Any]` to provide a concrete type
hint. Import `Any` from the `typing` module as a regular import at the top of
the file (not under TYPE_CHECKING) to support this type annotation.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: d4b0c327-18aa-4532-b112-bb8fde55fa11

📥 Commits

Reviewing files that changed from the base of the PR and between 89214a3 and 6c8ca76.

⛔ Files ignored due to path filters (1)
  • web/packages/sdk/generated/agents/schema/DeploymentLogsResponse.ts is excluded by !**/generated/**
📒 Files selected for processing (11)
  • services/intake/src/nmp/intake/api/v2/experiments/endpoints.py
  • services/intake/src/nmp/intake/config.py
  • services/intake/src/nmp/intake/entities/experiments.py
  • services/intake/src/nmp/intake/service.py
  • services/intake/src/nmp/intake/spans/api/dependencies.py
  • services/intake/src/nmp/intake/spans/experiment_rollup_refresher.py
  • services/intake/src/nmp/intake/spans/experiment_rollup_repository.py
  • services/intake/src/nmp/intake/spans/ingest/atif.py
  • services/intake/src/nmp/intake/spans/ingest/chat_completions.py
  • services/intake/tests/integration/spans/test_experiment_metrics_refresh.py
  • services/intake/tests/test_experiment_rollup_refresher.py

Comment thread services/intake/src/nmp/intake/api/v2/experiments/endpoints.py Outdated
Comment thread services/intake/src/nmp/intake/spans/experiment_rollup_refresher.py
@github-actions

github-actions Bot commented Jun 23, 2026

Copy link
Copy Markdown
Contributor
Suite Lines Covered Line Rate Branch Rate
Unit Tests 20917/27478 76.1% 61.2%
Integration Tests 12109/26247 46.1% 19.5%

Signed-off-by: shanaiabuggy <59746633+shanaiabuggy@users.noreply.github.com>
Signed-off-by: shanaiabuggy <59746633+shanaiabuggy@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant