feat(experiments): Denormalize rollup metrics onto Experiment + debounced refresh by shanaiabuggy · Pull Request #424 · NVIDIA-NeMo/nemo-platform

shanaiabuggy · 2026-06-23T21:31:16Z

See linear issue description for details: https://linear.app/nvidia/issue/ASE-319/denormalize-rollup-metrics-onto-experiment-debounced-refresh

Summary by CodeRabbit

New Features
- Experiments now expose system-managed cached metrics to improve list/detail responsiveness.
- Added a background worker that refreshes these cached metrics after experiment ingestion.
- Introduced a configurable refresh interval to control how quickly cached metrics become up to date.
Performance Improvements
- Experiment listing now conditionally uses cached metrics and falls back to hydration only when cached data can’t be decoded.
Tests
- Added integration and unit coverage for cache decoding, refresh triggering, and refresh worker drain/error scenarios.

…nced refresh Signed-off-by: shanaiabuggy <59746633+shanaiabuggy@users.noreply.github.com>

Signed-off-by: shanaiabuggy <59746633+shanaiabuggy@users.noreply.github.com>

coderabbitai · 2026-06-23T21:37:38Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 561637e7-a8da-4cbf-b942-d3ff015d2f5b

📥 Commits

Reviewing files that changed from the base of the PR and between 6535de6 and f2b6faa.

📒 Files selected for processing (1)

services/intake/tests/integration/spans/test_experiment_metrics_refresh.py

🚧 Files skipped from review as they are similar to previous changes (1)

services/intake/tests/integration/spans/test_experiment_metrics_refresh.py

📝 Walkthrough

Walkthrough

Adds a background ExperimentRollupRefresher worker that maintains a coalescing dirty set of (workspace, experiment_id) pairs. Ingest endpoints mark experiments dirty on each span write. The worker periodically drains the set, fetches rollups from ClickHouse, serializes them into a metrics dict on the Experiment entity. list_experiments now serves cached entity.metrics instead of querying ClickHouse per-read.

Changes

Experiment Rollup Background Refresh

Layer / File(s)	Summary
Configuration and Experiment entity model `services/intake/src/nmp/intake/config.py`, `services/intake/src/nmp/intake/entities/experiments.py`	Adds `rollup_refresh_interval_seconds: float` config with default 10.0 and gt=0 constraint; adds optional `metrics: dict[str, Any] \| None` to `Experiment` marked as system-managed and excluded from create/update bodies.
Rollup serialization: helpers and versioning `services/intake/src/nmp/intake/spans/experiment_rollup_repository.py`	Adds `METRICS_VERSION = 1`; introduces `_score_to_dict`/`_score_from_dict` for JSON-safe quantile/count round-trip; adds `rollup_to_metrics` to serialize `ExperimentRollup` with version and refreshed_at; adds `metrics_to_rollup` to deserialize back.
ExperimentRollupRefresher background worker `services/intake/src/nmp/intake/spans/experiment_rollup_refresher.py`	New async background worker: coalesces dirty `(workspace, experiment_id)` pairs via `mark_dirty`, periodically `flush` by draining and grouping per workspace, calls `get_rollups` once per workspace, serializes via `rollup_to_metrics`, writes `experiment.metrics` back, re-queues on `EntityConflictError`/rollup query failure, skips `EntityNotFoundError`, performs final flush on `stop` without mid-flush cancellation.
Service lifecycle and FastAPI dependency wiring `services/intake/src/nmp/intake/service.py`, `services/intake/src/nmp/intake/spans/api/dependencies.py`	`IntakeService` creates and starts `ExperimentRollupRefresher` on startup (conditional on entity client availability), stops it first during shutdown; `get_rollup_refresher` reads `rollup_refresher` from app state, `RollupRefresherDep` exposes as FastAPI dependency.
Ingest endpoints mark dirty on span write `services/intake/src/nmp/intake/spans/ingest/atif.py`, `services/intake/src/nmp/intake/spans/ingest/chat_completions.py`	Both endpoints accept `RollupRefresherDep`, compute `resolved_evaluation_context()` once, pass to `validate_experiment_context`, conditionally call `refresher.mark_dirty(workspace, experiment_id)` after ingest when refresher exists and evaluation_id is present.
list_experiments cache-first enrichment `services/intake/src/nmp/intake/api/v2/experiments/endpoints.py`	Adds `_cached_rollup` helper validating `METRICS_VERSION` and safely decoding `entity.metrics`; modifies `list_experiments` to apply cached rollups via `_cached_rollup`, forwarding only cache-miss experiments to `_hydrate_rollups`.
Unit and integration tests `services/intake/tests/test_experiment_rollup_refresher.py`, `services/intake/tests/test_experiment_metrics_cache_gate.py`, `services/intake/tests/integration/spans/test_experiment_metrics_refresh.py`	Unit tests: round-trip metrics conversion, flush batching per workspace, mark_dirty dedup, no-op flush, EntityNotFoundError skip, EntityConflictError re-queue, stop graceful drain, rollup query failure re-queue. Cache-gate tests: _cached_rollup returns None for absent/version-mismatch/malformed blobs. Integration tests: ATIF ingest marks dirty, no-context ingest leaves pending empty.

Sequence Diagram(s)

sequenceDiagram
  participant Client
  participant IngestEndpoint as ingest_atif /<br/> ingest_chat_completion
  participant ExperimentRollupRefresher
  participant ExperimentRollupRepository as ClickHouse
  participant EntityClient as Entity Store

  Client->>IngestEndpoint: POST /ingest (with experiment_context)
  IngestEndpoint->>EntityClient: store span batch
  IngestEndpoint->>ExperimentRollupRefresher: mark_dirty(workspace, experiment_id)

  Note over ExperimentRollupRefresher: background _run loop fires after interval_seconds
  ExperimentRollupRefresher->>ExperimentRollupRefresher: flush() drains _dirty, groups by workspace
  ExperimentRollupRefresher->>ExperimentRollupRepository: get_rollups(workspace, [experiment_ids])
  ExperimentRollupRepository-->>ExperimentRollupRefresher: dict[str, ExperimentRollup]
  loop per rollup
    ExperimentRollupRefresher->>EntityClient: get(Experiment)
    ExperimentRollupRefresher->>ExperimentRollupRefresher: rollup_to_metrics(rollup, refreshed_at)
    ExperimentRollupRefresher->>EntityClient: update(experiment.metrics = metrics_dict)
  end

  Client->>IngestEndpoint: GET /experiments
  IngestEndpoint->>EntityClient: list Experiments
  alt entity.metrics present
    IngestEndpoint->>IngestEndpoint: _cached_rollup(entity.metrics) — no ClickHouse call
  else entity.metrics absent
    IngestEndpoint->>ExperimentRollupRepository: _hydrate_rollups
  end
  IngestEndpoint-->>Client: enriched experiment list

Possibly related PRs

NVIDIA-NeMo/nemo-platform#124: Introduces the v2 experiments API schemas and CRUD endpoints; this PR extends those same experiment entities with cached rollup decoding and background refresh.
NVIDIA-NeMo/nemo-platform#154: Introduces ClickHouse-backed _hydrate_rollups path in api/v2/experiments/endpoints.py that this PR now bypasses when entity.metrics is cached.
NVIDIA-NeMo/nemo-platform#282: Refactors ExperimentRollupRepository.get_rollups, the method this PR's refresher calls to populate experiment.metrics.

Suggested reviewers

BrianNewsom

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 25.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	Title clearly describes the main changes: denormalizing rollup metrics onto Experiment entities and implementing a debounced refresh mechanism, which aligns with the substantial feature additions across the codebase.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch sbuggy/ase-319

_{Comment @coderabbitai help to get the list of available commands.}

coderabbitai

Actionable comments posted: 3

🧹 Nitpick comments (1)

services/intake/src/nmp/intake/spans/experiment_rollup_refresher.py (1)
106-106: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Use a concrete type for metrics in _write_metrics.

metrics: dict is too broad; use dict[str, Any] for concrete typing.
As per coding guidelines, "**/*.py: Always prefer concrete type hints over string-based ones in Python code; do not import types under TYPE_CHECKING, instead import types as regular imports when possible."
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@services/intake/src/nmp/intake/spans/experiment_rollup_refresher.py` at line
106, In the `_write_metrics` method signature, replace the `metrics: dict`
parameter type annotation with `dict[str, Any]` to provide a concrete type hint.
Import `Any` from the `typing` module as a regular import at the top of the file
(not under TYPE_CHECKING) to support this type annotation.
Source: Coding guidelines

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@services/intake/src/nmp/intake/api/v2/experiments/endpoints.py`:
- Around line 374-378: The call to metrics_to_rollup and _apply_rollup on line
375 can throw an exception if entity.metrics is malformed or incompatible, which
causes the entire list_experiments response to fail for that single bad row.
Wrap the _apply_rollup call in a try-except block, and when an exception is
caught during decoding, treat it as a cache miss by appending the response to
the needs_live_hydrate list just like the else branch does, rather than letting
the exception propagate and fail the entire operation.

In `@services/intake/src/nmp/intake/spans/experiment_rollup_refresher.py`:
- Around line 57-67: The stop method can lose experiment updates if the
cancellation occurs while flush is executing. After batch is detached from
self._dirty at lines 83-84, a cancellation would leave those IDs unwritten and
un-requeued. Instead of immediately cancelling the task after setting the
stopping flag, allow the task loop to naturally exit by checking the
self._stopping flag (which is already set), or modify the flush method to be
cancellation-safe by catching asyncio.CancelledError and re-queuing the detached
batch back to self._dirty before the exception propagates. Either approach
ensures no queued updates are lost during shutdown.

In `@services/intake/src/nmp/intake/spans/experiment_rollup_repository.py`:
- Around line 72-75: The metrics_to_rollup function (lines 92-106) reads stored
metrics without validating the version field that rollup_to_metrics writes (line
80), creating forward/backward compatibility issues. Add a version check at the
beginning of metrics_to_rollup that compares the "version" field in the metrics
payload against METRICS_VERSION; if they do not match, route to a fallback
re-hydration/recompute path instead of proceeding with the blind parsing that
could cause misreads or int() conversion errors.

---

Nitpick comments:
In `@services/intake/src/nmp/intake/spans/experiment_rollup_refresher.py`:
- Line 106: In the `_write_metrics` method signature, replace the `metrics:
dict` parameter type annotation with `dict[str, Any]` to provide a concrete type
hint. Import `Any` from the `typing` module as a regular import at the top of
the file (not under TYPE_CHECKING) to support this type annotation.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: d4b0c327-18aa-4532-b112-bb8fde55fa11

📥 Commits

Reviewing files that changed from the base of the PR and between 89214a3 and 6c8ca76.

⛔ Files ignored due to path filters (1)

web/packages/sdk/generated/agents/schema/DeploymentLogsResponse.ts is excluded by !**/generated/**

📒 Files selected for processing (11)

services/intake/src/nmp/intake/api/v2/experiments/endpoints.py
services/intake/src/nmp/intake/config.py
services/intake/src/nmp/intake/entities/experiments.py
services/intake/src/nmp/intake/service.py
services/intake/src/nmp/intake/spans/api/dependencies.py
services/intake/src/nmp/intake/spans/experiment_rollup_refresher.py
services/intake/src/nmp/intake/spans/experiment_rollup_repository.py
services/intake/src/nmp/intake/spans/ingest/atif.py
services/intake/src/nmp/intake/spans/ingest/chat_completions.py
services/intake/tests/integration/spans/test_experiment_metrics_refresh.py
services/intake/tests/test_experiment_rollup_refresher.py

github-actions · 2026-06-23T21:40:50Z

Suite	Lines Covered	Line Rate	Branch Rate
Unit Tests	20917/27478	76.1%	61.2%
Integration Tests	12109/26247	46.1%	19.5%

Signed-off-by: shanaiabuggy <59746633+shanaiabuggy@users.noreply.github.com>

shanaiabuggy added 2 commits June 23, 2026 15:22

feat(experiments): Denormalize rollup metrics onto Experiment + debou…

0f41dbe

…nced refresh Signed-off-by: shanaiabuggy <59746633+shanaiabuggy@users.noreply.github.com>

add version

6c8ca76

Signed-off-by: shanaiabuggy <59746633+shanaiabuggy@users.noreply.github.com>

shanaiabuggy requested review from a team as code owners June 23, 2026 21:31

github-actions Bot added the feat label Jun 23, 2026

coderabbitai Bot reviewed Jun 23, 2026

View reviewed changes

Comment thread services/intake/src/nmp/intake/api/v2/experiments/endpoints.py Outdated

Comment thread services/intake/src/nmp/intake/spans/experiment_rollup_refresher.py

Comment thread services/intake/src/nmp/intake/spans/experiment_rollup_repository.py

shanaiabuggy added 2 commits June 23, 2026 16:24

bunny

6535de6

Signed-off-by: shanaiabuggy <59746633+shanaiabuggy@users.noreply.github.com>

test fix

f2b6faa

Signed-off-by: shanaiabuggy <59746633+shanaiabuggy@users.noreply.github.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(experiments): Denormalize rollup metrics onto Experiment + debounced refresh#424

feat(experiments): Denormalize rollup metrics onto Experiment + debounced refresh#424
shanaiabuggy wants to merge 4 commits into
mainfrom
sbuggy/ase-319

shanaiabuggy commented Jun 23, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Jun 23, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagram(s)

Possibly related PRs

Suggested reviewers

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions Bot commented Jun 23, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

shanaiabuggy commented Jun 23, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Possibly related PRs

Suggested reviewers

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions Bot commented Jun 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

shanaiabuggy commented Jun 23, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 23, 2026 •

edited

Loading

github-actions Bot commented Jun 23, 2026 •

edited

Loading