-
Notifications
You must be signed in to change notification settings - Fork 6
feat(guardrails): Update benchmark job; add latency analysis step #359
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
JashG
wants to merge
8
commits into
main
Choose a base branch
from
jgulabrai/guardrails-benchmark-analysis
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
8 commits
Select commit
Hold shift + click to select a range
9addf61
feat(guardrails): Benchmark analysis in-progress
JashG f78ffa8
Fix CI
JashG 78b7730
always run benchmark
JashG 2acf55b
Cleanup
JashG f0fef0a
Updates
JashG 98f5d45
Clean up CI config and baselines
JashG 8a55a3a
Minor ReadME cleanup
JashG dcaf12f
Address CodeRabbit
JashG File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
15 changes: 15 additions & 0 deletions
15
plugins/nemo-guardrails/benchmarks/configs/mock_llm/README.md
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,15 @@ | ||
| # Mock LLM configurations | ||
|
|
||
| These `.env` files configure the behavior of the mock LLMs, used by the upstream | ||
| `nemo-guardrails` library's `benchmark.mock_llm_server.run_server`. | ||
|
|
||
| The library stores these files, but we keep our own copies so: | ||
|
|
||
| - We can change mock latency without touching the upstream repo. | ||
| - The exact mock behavior we benchmarked against is versioned alongside the | ||
| results, so historical numbers stay reproducible even if upstream changes | ||
| its defaults. | ||
|
|
||
| Mapping to upstream files: | ||
| - `app-llm.env` ← upstream `meta-llama-3.3-70b-instruct.env` | ||
| - `content-safety-llm.env` ← upstream `nvidia-llama-3.1-nemoguard-8b-content-safety.env` |
19 changes: 19 additions & 0 deletions
19
plugins/nemo-guardrails/benchmarks/configs/mock_llm/app-llm.env
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,19 @@ | ||
| MODEL="meta/llama-3.3-70b-instruct" | ||
| UNSAFE_PROBABILITY=0.0 | ||
| UNSAFE_TEXT="I can't help with that. Is there anything else I can assist you with?" | ||
| SAFE_TEXT="I can provide information and help with a wide range of topics, from science and history to entertainment and culture. I can also help with language-related tasks, such as translation and text summarization. However, I can't assist with requests that involve harm or illegal activities." | ||
| # End-to-end latency | ||
| E2E_LATENCY_MIN_SECONDS=4.0 | ||
| E2E_LATENCY_MAX_SECONDS=4.0 | ||
| E2E_LATENCY_MEAN_SECONDS=4.0 | ||
| E2E_LATENCY_STD_SECONDS=0.0 | ||
| # Streaming latency: Time to First Token (TTFT) | ||
| TTFT_MIN_SECONDS=0.3 | ||
| TTFT_MAX_SECONDS=0.3 | ||
| TTFT_MEAN_SECONDS=0.3 | ||
| TTFT_STD_SECONDS=0.0 | ||
| # Streaming latency: Chunk Latency (ITL) | ||
| CHUNK_LATENCY_MIN_SECONDS=0.015 | ||
| CHUNK_LATENCY_MAX_SECONDS=0.015 | ||
| CHUNK_LATENCY_MEAN_SECONDS=0.015 | ||
| CHUNK_LATENCY_STD_SECONDS=0.0 |
19 changes: 19 additions & 0 deletions
19
plugins/nemo-guardrails/benchmarks/configs/mock_llm/content-safety-llm.env
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,19 @@ | ||
| MODEL="nvidia/llama-3.1-nemoguard-8b-content-safety" | ||
| UNSAFE_PROBABILITY=0.0 | ||
| UNSAFE_TEXT="{\"User Safety\": \"unsafe\", \"Response Safety\": \"unsafe\", \"Safety Categories\": \"Violence, Criminal Planning/Confessions\"}" | ||
| SAFE_TEXT="{\"User Safety\": \"safe\", \"Response Safety\": \"safe\"}" | ||
| # End-to-end latency | ||
| E2E_LATENCY_MIN_SECONDS=0.5 | ||
| E2E_LATENCY_MAX_SECONDS=0.5 | ||
| E2E_LATENCY_MEAN_SECONDS=0.5 | ||
| E2E_LATENCY_STD_SECONDS=0.0 | ||
| # Streaming latency: Time to First Token (TTFT) | ||
| TTFT_MIN_SECONDS=0.2 | ||
| TTFT_MAX_SECONDS=0.2 | ||
| TTFT_MEAN_SECONDS=0.2 | ||
| TTFT_STD_SECONDS=0.0 | ||
| # Streaming latency: Chunk Latency (ITL) | ||
| CHUNK_LATENCY_MIN_SECONDS=0.015 | ||
| CHUNK_LATENCY_MAX_SECONDS=0.015 | ||
| CHUNK_LATENCY_MEAN_SECONDS=0.015 | ||
| CHUNK_LATENCY_STD_SECONDS=0.0 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.