feat(guardrails): Update benchmark job; add latency analysis step by JashG · Pull Request #359 · NVIDIA-NeMo/nemo-platform

JashG · 2026-06-16T21:09:20Z

Summary

guardrails-benchmark job

Updates the guardrails-benchmark CI step:

Run two variants of the benchmark: one without Guardrails, and one with Guardrails (content safety input and output checks). This enables checking overhead added by the nemo-guardrails plugin.

guardrails-benchmark-analyze job

Updates the guardrails-benchmark-analyze CI step:

Outputs three tables:
- Measured Latencies: Total e2e latency for both benchmarks.
- Platform Overhead: Latency added by NMP (i.e. latencies without counting the mock LLM calls).
- Guardrails Overhead vs. Baseline: Latency added by Guardrails compared to pre-defined baselines. These baselines are the average of several runs in CI. I added a "reasonable +/- threshold" to account for potential latency deviations.
The job fails if the Guardrails overhead exceeds the reasonable threshold vs. the baseline. Currently, this job won't block the pipeline.

Benchmark harness

run.py now accepts --variant {with-guardrails, without-guardrails, all}. all runs both variants sequentially against the same NMP, writing them to the same run directory under aiperf_results/<variant>/.
seeding.py creates a second control VirtualModel (no-guardrails-vm) alongside the existing guarded one.
aiperf_runner.py accepts a model_ref so each variant's sweep targets the correct VM.
New analyze.py (stdlib-only) reads both variants from a run dir and prints three tables: measured latencies, platform overhead (mock-LLM time subtracted), and the baseline check.

github-actions · 2026-06-16T21:20:16Z

Suite	Lines Covered	Line Rate	Branch Rate
Unit Tests	21176/27762	76.3%	61.2%
Integration Tests	12216/26531	46.0%	19.5%

Signed-off-by: Jash Gulabrai <jgulabrai@nvidia.com>

coderabbitai · 2026-06-23T14:20:29Z

📝 Walkthrough

Walkthrough

This PR adds a guardrails-specific CI trigger, converts the Nemo Guardrails benchmark into with/without-guardrails variants, adds a latency comparison analyzer with baseline checks, and updates benchmark configs, tests, and documentation to match the new run and analysis flow.

Changes

Guardrails benchmark workflow

Layer / File(s)	Summary
Change detection and benchmark job trigger `.github/actions/changes/action.yaml`, `.github/workflows/ci.yaml`	Adds a `guardrails-benchmark` change flag and uses it to conditionally start a matrix benchmark job.
Benchmark variant constants and control virtual model `plugins/nemo-guardrails/src/nemo_guardrails_plugin/benchmarks/{constants.py,seeding.py}`, `plugins/nemo-guardrails/tests/unit/benchmarks/test_seeding.py`	New variant and control VM constants; seeding creates a no-middleware control VM alongside the main guardrails VM; tests updated to assert both VMs are created.
Paths and mock LLM environment configs `plugins/nemo-guardrails/src/nemo_guardrails_plugin/benchmarks/{paths.py}`, `plugins/nemo-guardrails/benchmarks/configs/mock_llm/*`, `plugins/nemo-guardrails/benchmarks/configs/mock_llm/README.md`	RunPaths gains in-repo mock LLM env fields and per-variant AIPerf helpers; mock env files define model, safety, and deterministic latency parameters; README documents versioning rationale.
Benchmark runtime with variant execution `plugins/nemo-guardrails/src/nemo_guardrails_plugin/benchmarks/{run.py,aiperf_runner.py}`	Harness validates in-repo configs, selects per-variant VM references, prepares variant-specific runtime configs, executes variants sequentially, collects outcomes, and optionally prints analysis.
Latency analysis and baseline validation `plugins/nemo-guardrails/src/nemo_guardrails_plugin/benchmarks/analyze.py`	New analyzer loads per-concurrency latency from both variant outputs, computes measured vs platform overhead tables, validates delta p50 against baseline constants with per-concurrency tolerance overrides, and produces pass/fail reports.
CI workflow matrix and artifact aggregation `.github/workflows/ci.yaml`	Replaces single benchmark job with matrix variants; new analyze job downloads both artifacts, runs strict analysis, and uploads merged results.
Documentation updates `plugins/nemo-guardrails/benchmarks/README.md`	Documents updated directory layout, two-job CI flow, baseline delta p50 validation with tolerance overrides, and analyzer local execution.

Sequence Diagram(s)

sequenceDiagram
  participant Changes as changes job
  participant Bench as guardrails-benchmark
  participant Store as artifact storage
  participant Analyze as guardrails-benchmark-analyze

  Changes->>Bench: guardrails-benchmark == true
  Bench->>Bench: run with-guardrails variant
  Bench->>Store: upload variant artifact
  Bench->>Bench: run without-guardrails variant
  Bench->>Store: upload variant artifact
  Analyze->>Store: download both artifacts
  Analyze->>Analyze: run analyze.py --strict
  Analyze->>Store: upload merged benchmark results

Possibly related PRs

NVIDIA-NeMo/nemo-platform#80: Introduced the same guardrails benchmark modules and CI path that this PR extends with dual variants and result analysis.

Suggested reviewers

gabwow
albcui

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 55.56% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	Title accurately captures the main changes: introducing dual-variant benchmark execution and a new latency analysis step in the guardrails CI pipeline.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch jgulabrai/guardrails-benchmark-analysis

_{Comment @coderabbitai help to get the list of available commands.}

coderabbitai

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

.github/workflows/ci.yaml (1)
1306-1334: 🩺 Stability & Availability | 🟠 Major

Add guardrails-benchmark and guardrails-benchmark-analyze to ci-status dependencies.

The ci-status job (lines 1306-1334) omits both benchmark jobs. If ci-status is the required gate and benchmark jobs fail (when triggered by changes or dispatch), those failures are missed.
Suggested patch
  ci-status:
    name: CI status
    needs:
      - changes
+     - guardrails-benchmark
+     - guardrails-benchmark-analyze
      - actionlint
      - docker-bake-graph
      - build-cpu-smoke-images
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.github/workflows/ci.yaml around lines 1306 - 1334, The ci-status job is
missing guardrails-benchmark and guardrails-benchmark-analyze from its needs
dependencies list. Add both of these job names to the needs array under the
ci-status job (after the other existing job names like web-studio-e2e and
opa-policy-test) to ensure benchmark job failures are reflected in the overall
CI status gate.

🧹 Nitpick comments (1)

plugins/nemo-guardrails/benchmarks/configs/mock_llm/README.md (1)
1-22: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Restructure this page to one Diataxis type with required sections.

This page is currently descriptive but missing required doc scaffolding. Convert it into a single quadrant (likely REFERENCE), add a top prerequisites block, add concrete CLI/Python examples in a tab-set, and end with a “Next Steps” section linking to related benchmark docs.
As per coding guidelines, Each documentation page should fit ONE Diataxis quadrant, Always list prerequisites at the top, Provide both Python SDK and CLI examples in tab-sets, and Include 'Next Steps' section at the end.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@plugins/nemo-guardrails/benchmarks/configs/mock_llm/README.md` around lines 1
- 22, Restructure the Mock LLM configurations README to follow Diataxis
REFERENCE format by adding a prerequisites section at the top that lists
required dependencies and setup steps, reorganizing the existing descriptive
content into a properly structured REFERENCE page, adding a new section with
concrete CLI and Python SDK examples in tab-set format demonstrating how to use
and configure the app-llm.env and content-safety-llm.env files, and ending with
a "Next Steps" section that links to related benchmark documentation and guides
for running benchmarks with these configurations.
Source: Coding guidelines

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.github/workflows/ci.yaml:
- Around line 1142-1147: The analyze job's if condition prevents it from running
when the guardrails-benchmark job fails, even though the job handles missing
artifacts gracefully with continue-on-error. Modify the if condition that
currently checks for !cancelled() and the workflow_dispatch or changes.outputs
conditions to use always() instead, so the analyze job runs regardless of the
guardrails-benchmark job's status and can process partial artifacts when
available.
- Around line 1151-1154: The checkout action for nemo-platform is missing the
persist-credentials security setting. Add `persist-credentials: false` to the
`with:` section of the "Checkout nemo-platform" step to prevent the GitHub
Actions authentication token from being stored in the Git configuration, since
this job only needs to download and upload artifacts without performing Git
operations.

In `@plugins/nemo-guardrails/benchmarks/README.md`:
- Around line 227-229: The worked example in the benchmark README for c=16
tolerance incorrectly states the tolerance as 300 ms when the actual value used
in analyze.py is 200 ms. Update lines 227-229 to replace the 300 ms tolerance
value with the correct 200 ms value from analyze.py line 46, and recalculate the
pass/fail example thresholds accordingly so that the example accurately
demonstrates how the 200 ms tolerance determines whether a performance run
passes or fails based on the delta_p50 values.

In `@plugins/nemo-guardrails/src/nemo_guardrails_plugin/benchmarks/seeding.py`:
- Around line 181-194: The control VM creation in the seeding flow can reuse an
existing virtual model unchanged when create(..., exist_ok=True) returns it, so
the empty request_middleware and response_middleware settings may be ignored.
Update the no_guardrails_vm handling in the seeding logic by either calling
client.inference.virtual_models.update() after creation to explicitly clear
middleware, or deleting any existing NO_GUARDRAILS_VM_NAME before recreating it,
so the control VM always stays middleware-free.

---

Outside diff comments:
In @.github/workflows/ci.yaml:
- Around line 1306-1334: The ci-status job is missing guardrails-benchmark and
guardrails-benchmark-analyze from its needs dependencies list. Add both of these
job names to the needs array under the ci-status job (after the other existing
job names like web-studio-e2e and opa-policy-test) to ensure benchmark job
failures are reflected in the overall CI status gate.

---

Nitpick comments:
In `@plugins/nemo-guardrails/benchmarks/configs/mock_llm/README.md`:
- Around line 1-22: Restructure the Mock LLM configurations README to follow
Diataxis REFERENCE format by adding a prerequisites section at the top that
lists required dependencies and setup steps, reorganizing the existing
descriptive content into a properly structured REFERENCE page, adding a new
section with concrete CLI and Python SDK examples in tab-set format
demonstrating how to use and configure the app-llm.env and
content-safety-llm.env files, and ending with a "Next Steps" section that links
to related benchmark documentation and guides for running benchmarks with these
configurations.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 1cd5064a-464d-418f-a424-914014ecf570

📥 Commits

Reviewing files that changed from the base of the PR and between c48f52c and 98f5d45.

📒 Files selected for processing (13)

.github/actions/changes/action.yaml
.github/workflows/ci.yaml
plugins/nemo-guardrails/benchmarks/README.md
plugins/nemo-guardrails/benchmarks/configs/mock_llm/README.md
plugins/nemo-guardrails/benchmarks/configs/mock_llm/app-llm.env
plugins/nemo-guardrails/benchmarks/configs/mock_llm/content-safety-llm.env
plugins/nemo-guardrails/src/nemo_guardrails_plugin/benchmarks/aiperf_runner.py
plugins/nemo-guardrails/src/nemo_guardrails_plugin/benchmarks/analyze.py
plugins/nemo-guardrails/src/nemo_guardrails_plugin/benchmarks/constants.py
plugins/nemo-guardrails/src/nemo_guardrails_plugin/benchmarks/paths.py
plugins/nemo-guardrails/src/nemo_guardrails_plugin/benchmarks/run.py
plugins/nemo-guardrails/src/nemo_guardrails_plugin/benchmarks/seeding.py
plugins/nemo-guardrails/tests/unit/benchmarks/test_seeding.py

Signed-off-by: Jash Gulabrai <jgulabrai@nvidia.com>

github-actions Bot added the feat label Jun 16, 2026

JashG added 2 commits June 22, 2026 10:12

feat(guardrails): Benchmark analysis in-progress

9addf61

Signed-off-by: Jash Gulabrai <jgulabrai@nvidia.com>

Fix CI

f78ffa8

Signed-off-by: Jash Gulabrai <jgulabrai@nvidia.com>

JashG force-pushed the jgulabrai/guardrails-benchmark-analysis branch from 6115c89 to f78ffa8 Compare June 22, 2026 14:12

JashG added 4 commits June 22, 2026 11:23

always run benchmark

78b7730

Signed-off-by: Jash Gulabrai <jgulabrai@nvidia.com>

Cleanup

2acf55b

Signed-off-by: Jash Gulabrai <jgulabrai@nvidia.com>

Updates

f0fef0a

Signed-off-by: Jash Gulabrai <jgulabrai@nvidia.com>

Clean up CI config and baselines

98f5d45

Signed-off-by: Jash Gulabrai <jgulabrai@nvidia.com>

JashG marked this pull request as ready for review June 23, 2026 14:10

JashG requested review from a team as code owners June 23, 2026 14:10

Minor ReadME cleanup

8a55a3a

Signed-off-by: Jash Gulabrai <jgulabrai@nvidia.com>

coderabbitai Bot reviewed Jun 23, 2026

View reviewed changes

Comment thread .github/workflows/ci.yaml

Comment thread .github/workflows/ci.yaml

Comment thread plugins/nemo-guardrails/benchmarks/README.md Outdated

Comment thread plugins/nemo-guardrails/src/nemo_guardrails_plugin/benchmarks/seeding.py

Address CodeRabbit

dcaf12f

Signed-off-by: Jash Gulabrai <jgulabrai@nvidia.com>

JashG force-pushed the jgulabrai/guardrails-benchmark-analysis branch from ad12028 to dcaf12f Compare June 23, 2026 14:36

JashG requested review from albcui and gabwow June 23, 2026 14:37

JashG changed the title ~~feat(guardrails): Benchmark analysis~~ feat(guardrails): Update benchmark job; add latency analysis step Jun 23, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(guardrails): Update benchmark job; add latency analysis step#359

feat(guardrails): Update benchmark job; add latency analysis step#359
JashG wants to merge 8 commits into
mainfrom
jgulabrai/guardrails-benchmark-analysis

JashG commented Jun 16, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 16, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot commented Jun 23, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagram(s)

Possibly related PRs

Suggested reviewers

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

JashG commented Jun 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

guardrails-benchmark job

guardrails-benchmark-analyze job

Benchmark harness

Uh oh!

github-actions Bot commented Jun 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai Bot commented Jun 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Possibly related PRs

Suggested reviewers

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

JashG commented Jun 16, 2026 •

edited

Loading

github-actions Bot commented Jun 16, 2026 •

edited

Loading

coderabbitai Bot commented Jun 23, 2026 •

edited

Loading