adding readme for skills-eval by jperez999 · Pull Request #2055 · NVIDIA/NeMo-Retriever

jperez999 · 2026-05-18T20:02:17Z

Description

Checklist

I am familiar with the Contributing Guidelines.
New or existing tests cover these changes.
The documentation is up to date with these changes.
If adjusting docker-compose.yaml environment variables have you ensured those are mimicked in the Helm values.yaml file.

greptile-apps · 2026-05-18T20:04:15Z

Greptile Summary

This PR adds a 299-line README.md for the retriever skill-eval benchmarking command, documenting how to configure and run ViDoRe v3 recall/judge evaluations under three agent conditions (c1_base, c2_retriever, c3_retriever_skill).

Setup guide: covers prerequisites (Claude CLI, NVIDIA_API_KEY), the agent-eval manifest schema, and the skill_eval.yaml config format binding domain keys to PDF directories.
CLI reference and output layout: describes all skill-eval run options, the timestamped artifact directory structure, and how to interpret the session_summary.md recall and judge metrics.
Troubleshooting section: enumerates common failure modes including missing binaries, pdf_dirs key mismatches, and judge-disabled behaviour.

Confidence Score: 4/5

Safe to merge after addressing the tilde-expansion defect in the example YAML config.

The example config labels ~/datasets/... as an 'Absolute path' and the troubleshooting section already hints that ~ expansion can silently fail; a user following the README verbatim will hit a confusing path-not-found error. No code is changed — only documentation — so the blast radius is limited, but the mismatch between the example and the troubleshooting note is concrete enough to warrant a fix before the README ships.

nemo_retriever/src/nemo_retriever/skill_eval/README.md — the example YAML config and its inline comment about 'absolute path' need to be reconciled with the tilde-expansion caveat in the troubleshooting section.

Important Files Changed

Filename	Overview
nemo_retriever/src/nemo_retriever/skill_eval/README.md	New README documenting the skill-eval benchmarking harness; contains a documentation defect where the example YAML config uses `~/` paths that PyYAML won't expand, contradicting the inline comment that calls them "absolute paths."

Sequence Diagram

sequenceDiagram
    participant User
    participant CLI as retriever skill-eval run
    participant Runner as runner.py
    participant Claude as claude subprocess
    participant Judge as LLM Judge (NVIDIA NIM)
    participant Artifacts as artifacts/skilleval_ts/

    User->>CLI: skill-eval run --config skill_eval.yaml
    CLI->>Runner: load config + manifest
    Runner->>Runner: validate pdf_dirs keys vs manifest domains

    loop for each (condition, domain)
        Runner->>Runner: build scratch workdir in /tmp/skill_eval/
        Runner->>Runner: symlink PDFs into workdir/pdfs/
        Runner->>Claude: setup turn (--permission-mode bypassPermissions)
        Claude-->>Runner: session-id + token usage
        loop for each manifest entry (query turn)
            Runner->>Claude: --resume session-id + paraphrased prompt
            Claude-->>Runner: final_answer + ranked_retrieved
            opt NVIDIA_API_KEY set
                Runner->>Judge: score final_answer vs ground-truth
                Judge-->>Runner: 0-5 score
            end
            Runner->>Artifacts: write TrialResult JSON
        end
        Runner->>Runner: "compute recall@1/5/10"
        Runner->>Runner: delete /tmp/skill_eval/cond/domain/
    end

    Runner->>Artifacts: write session_summary.json + session_summary.md
    Runner-->>User: print per-(condition,domain) recall + judge table

Prompt To Fix All With AI

Fix the following 1 code review issue. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 1
nemo_retriever/src/nemo_retriever/skill_eval/README.md:136-137
**`~/` paths in YAML example won't expand**

The comment on line 136 says "Absolute path to your agent-eval manifest," but `~/datasets/...` is not an absolute path — `~` is a shell shorthand that standard YAML loaders (PyYAML's `safe_load`) pass through verbatim. Unless `runner.py` explicitly calls `os.path.expanduser()` on every path value after loading, the runner will look for a literal directory named `~` and fail. The troubleshooting section (line 291) already hints at this: "the value under `pdf_dirs.<domain>` was unset (`~` expansion failed …)" — but users who hit this error after following the example config will find it confusing. The example should use a real absolute path (e.g. `/home/user/datasets/vidore_v3/agent_eval_manifest.json`) or add a note that `~` must be pre-expanded before writing it into the YAML.

_{Reviews (2): Last reviewed commit: "Merge pull request #1 from NVIDIA/dev/wa..." | Re-trigger Greptile}

greptile-apps · 2026-05-18T20:04:19Z

+
+## 1. Make the PDF tree reachable
+
+ViDoRe v3 is split per-domain. The seven domains the harness recognises are:


The list immediately below this sentence contains eight domain entries, but the prose says "seven." This mismatch will confuse readers trying to reconcile the domain count. The same section later (line 204) correctly states "three conditions × eight domains = 24 sessions," confirming eight is the right number.

Suggested change

ViDoRe v3 is split per-domain. The seven domains the harness recognises are:

ViDoRe v3 is split per-domain. The eight domains the harness recognises are:

Prompt To Fix With AI

This is a comment left during a code review. Path: nemo_retriever/src/nemo_retriever/skill_eval/README.md Line: 56 Comment: The list immediately below this sentence contains eight domain entries, but the prose says "seven." This mismatch will confuse readers trying to reconcile the domain count. The same section later (line 204) correctly states "three conditions × eight domains = 24 sessions," confirming eight is the right number. ```suggestion ViDoRe v3 is split per-domain. The eight domains the harness recognises are: ``` How can I resolve this? If you propose a fix, please make it concise.

greptile-apps · 2026-05-18T20:04:20Z

+
+**`config 'pdf_dirs' is missing an entry for domain '<X>'`** — your manifest contains a `domain` value that has no key in `pdf_dirs`. Either add the key, or use `--domains` to skip that subset.
+
+**`PDF directory '…' for domain '…' does not exist or is not a directory`** — the value under `pdf_dirs.<domain>` was unset (`~` expansion failed, typo, etc.). Resolve the path manually with `ls "$PATH"` and update the config.


$PATH is the shell variable for executable search directories, not for the PDF directory path. Running ls "$PATH" would expand to something like ls "/usr/local/bin:/usr/bin:/bin" and error or list unrelated directories. The intent is to inspect the configured PDF directory from the YAML config.

Suggested change

**`PDF directory '…' for domain '…' does not exist or is not a directory`** — the value under `pdf_dirs.<domain>` was unset (`~` expansion failed, typo, etc.). Resolve the path manually with `ls "$PATH"` and update the config.

**`PDF directory '…' for domain '…' does not exist or is not a directory`** — the value under `pdf_dirs.<domain>` was unset (`~` expansion failed, typo, etc.). Resolve the path manually with `ls "/your/configured/pdf_dirs/path"` and update the config.

Prompt To Fix With AI

This is a comment left during a code review. Path: nemo_retriever/src/nemo_retriever/skill_eval/README.md Line: 290 Comment: `$PATH` is the shell variable for executable search directories, not for the PDF directory path. Running `ls "$PATH"` would expand to something like `ls "/usr/local/bin:/usr/bin:/bin"` and error or list unrelated directories. The intent is to inspect the configured PDF directory from the YAML config. ```suggestion **`PDF directory '…' for domain '…' does not exist or is not a directory`** — the value under `pdf_dirs.<domain>` was unset (`~` expansion failed, typo, etc.). Resolve the path manually with `ls "/your/configured/pdf_dirs/path"` and update the config. ``` How can I resolve this? If you propose a fix, please make it concise.

…utorun-prereq Update skill-eval README prerequisites

greptile-apps · 2026-05-19T00:02:39Z

+# Absolute path to your agent-eval manifest (JSON list).
+eval_manifest_path: ~/datasets/vidore_v3/agent_eval_manifest.json


~/ paths in YAML example won't expand

The comment on line 136 says "Absolute path to your agent-eval manifest," but ~/datasets/... is not an absolute path — ~ is a shell shorthand that standard YAML loaders (PyYAML's safe_load) pass through verbatim. Unless runner.py explicitly calls os.path.expanduser() on every path value after loading, the runner will look for a literal directory named ~ and fail. The troubleshooting section (line 291) already hints at this: "the value under pdf_dirs.<domain> was unset (~ expansion failed …)" — but users who hit this error after following the example config will find it confusing. The example should use a real absolute path (e.g. /home/user/datasets/vidore_v3/agent_eval_manifest.json) or add a note that ~ must be pre-expanded before writing it into the YAML.

Prompt To Fix With AI

This is a comment left during a code review. Path: nemo_retriever/src/nemo_retriever/skill_eval/README.md Line: 136-137 Comment: **`~/` paths in YAML example won't expand** The comment on line 136 says "Absolute path to your agent-eval manifest," but `~/datasets/...` is not an absolute path — `~` is a shell shorthand that standard YAML loaders (PyYAML's `safe_load`) pass through verbatim. Unless `runner.py` explicitly calls `os.path.expanduser()` on every path value after loading, the runner will look for a literal directory named `~` and fail. The troubleshooting section (line 291) already hints at this: "the value under `pdf_dirs.<domain>` was unset (`~` expansion failed …)" — but users who hit this error after following the example config will find it confusing. The example should use a real absolute path (e.g. `/home/user/datasets/vidore_v3/agent_eval_manifest.json`) or add a note that `~` must be pre-expanded before writing it into the YAML. How can I resolve this? If you propose a fix, please make it concise.

adding readme for skills-eval

c845724

jperez999 self-assigned this May 18, 2026

jperez999 requested review from a team as code owners May 18, 2026 20:02

jperez999 requested a review from ChrisJar May 18, 2026 20:02

greptile-apps Bot reviewed May 18, 2026

View reviewed changes

mahikaw and others added 2 commits May 18, 2026 15:42

Update skill-eval README prerequisites

8da9f84

Merge pull request #1 from NVIDIA/dev/wasonmahika/skill-eval-readme-a…

7017b8c

…utorun-prereq Update skill-eval README prerequisites

greptile-apps Bot reviewed May 19, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

adding readme for skills-eval#2055

adding readme for skills-eval#2055
jperez999 wants to merge 3 commits into
NVIDIA:mainfrom
jperez999:readme-skills-add

jperez999 commented May 18, 2026

Uh oh!

greptile-apps Bot commented May 18, 2026 •

edited

Loading

Confidence Score: 4/5

Important Files Changed

Sequence Diagram

Uh oh!

greptile-apps Bot May 18, 2026

Uh oh!

greptile-apps Bot May 18, 2026

Uh oh!

greptile-apps Bot May 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants


		## 1. Make the PDF tree reachable

		ViDoRe v3 is split per-domain. The seven domains the harness recognises are:

	ViDoRe v3 is split per-domain. The seven domains the harness recognises are:
	ViDoRe v3 is split per-domain. The eight domains the harness recognises are:


		`config 'pdf_dirs' is missing an entry for domain '<X>'` — your manifest contains a `domain` value that has no key in `pdf_dirs`. Either add the key, or use `--domains` to skip that subset.

		`PDF directory '…' for domain '…' does not exist or is not a directory` — the value under `pdf_dirs.<domain>` was unset (`~` expansion failed, typo, etc.). Resolve the path manually with `ls "$PATH"` and update the config.

		# Absolute path to your agent-eval manifest (JSON list).
		eval_manifest_path: ~/datasets/vidore_v3/agent_eval_manifest.json

Conversation

jperez999 commented May 18, 2026

Description

Checklist

Uh oh!

greptile-apps Bot commented May 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Sequence Diagram

Uh oh!

greptile-apps Bot May 18, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot May 18, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot May 19, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

greptile-apps Bot commented May 18, 2026 •

edited

Loading