Skip to content

adding readme for skills-eval#2055

Open
jperez999 wants to merge 3 commits into
NVIDIA:mainfrom
jperez999:readme-skills-add
Open

adding readme for skills-eval#2055
jperez999 wants to merge 3 commits into
NVIDIA:mainfrom
jperez999:readme-skills-add

Conversation

@jperez999
Copy link
Copy Markdown
Collaborator

Description

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.
  • If adjusting docker-compose.yaml environment variables have you ensured those are mimicked in the Helm values.yaml file.

@jperez999 jperez999 self-assigned this May 18, 2026
@jperez999 jperez999 requested review from a team as code owners May 18, 2026 20:02
@jperez999 jperez999 requested a review from ChrisJar May 18, 2026 20:02
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 18, 2026

Greptile Summary

This PR adds a 299-line README.md for the retriever skill-eval benchmarking command, documenting how to configure and run ViDoRe v3 recall/judge evaluations under three agent conditions (c1_base, c2_retriever, c3_retriever_skill).

  • Setup guide: covers prerequisites (Claude CLI, NVIDIA_API_KEY), the agent-eval manifest schema, and the skill_eval.yaml config format binding domain keys to PDF directories.
  • CLI reference and output layout: describes all skill-eval run options, the timestamped artifact directory structure, and how to interpret the session_summary.md recall and judge metrics.
  • Troubleshooting section: enumerates common failure modes including missing binaries, pdf_dirs key mismatches, and judge-disabled behaviour.

Confidence Score: 4/5

Safe to merge after addressing the tilde-expansion defect in the example YAML config.

The example config labels ~/datasets/... as an 'Absolute path' and the troubleshooting section already hints that ~ expansion can silently fail; a user following the README verbatim will hit a confusing path-not-found error. No code is changed — only documentation — so the blast radius is limited, but the mismatch between the example and the troubleshooting note is concrete enough to warrant a fix before the README ships.

nemo_retriever/src/nemo_retriever/skill_eval/README.md — the example YAML config and its inline comment about 'absolute path' need to be reconciled with the tilde-expansion caveat in the troubleshooting section.

Important Files Changed

Filename Overview
nemo_retriever/src/nemo_retriever/skill_eval/README.md New README documenting the skill-eval benchmarking harness; contains a documentation defect where the example YAML config uses ~/ paths that PyYAML won't expand, contradicting the inline comment that calls them "absolute paths."

Sequence Diagram

sequenceDiagram
    participant User
    participant CLI as retriever skill-eval run
    participant Runner as runner.py
    participant Claude as claude subprocess
    participant Judge as LLM Judge (NVIDIA NIM)
    participant Artifacts as artifacts/skilleval_ts/

    User->>CLI: skill-eval run --config skill_eval.yaml
    CLI->>Runner: load config + manifest
    Runner->>Runner: validate pdf_dirs keys vs manifest domains

    loop for each (condition, domain)
        Runner->>Runner: build scratch workdir in /tmp/skill_eval/
        Runner->>Runner: symlink PDFs into workdir/pdfs/
        Runner->>Claude: setup turn (--permission-mode bypassPermissions)
        Claude-->>Runner: session-id + token usage
        loop for each manifest entry (query turn)
            Runner->>Claude: --resume session-id + paraphrased prompt
            Claude-->>Runner: final_answer + ranked_retrieved
            opt NVIDIA_API_KEY set
                Runner->>Judge: score final_answer vs ground-truth
                Judge-->>Runner: 0-5 score
            end
            Runner->>Artifacts: write TrialResult JSON
        end
        Runner->>Runner: "compute recall@1/5/10"
        Runner->>Runner: delete /tmp/skill_eval/cond/domain/
    end

    Runner->>Artifacts: write session_summary.json + session_summary.md
    Runner-->>User: print per-(condition,domain) recall + judge table
Loading
Prompt To Fix All With AI
Fix the following 1 code review issue. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 1
nemo_retriever/src/nemo_retriever/skill_eval/README.md:136-137
**`~/` paths in YAML example won't expand**

The comment on line 136 says "Absolute path to your agent-eval manifest," but `~/datasets/...` is not an absolute path — `~` is a shell shorthand that standard YAML loaders (PyYAML's `safe_load`) pass through verbatim. Unless `runner.py` explicitly calls `os.path.expanduser()` on every path value after loading, the runner will look for a literal directory named `~` and fail. The troubleshooting section (line 291) already hints at this: "the value under `pdf_dirs.<domain>` was unset (`~` expansion failed …)" — but users who hit this error after following the example config will find it confusing. The example should use a real absolute path (e.g. `/home/user/datasets/vidore_v3/agent_eval_manifest.json`) or add a note that `~` must be pre-expanded before writing it into the YAML.

Reviews (2): Last reviewed commit: "Merge pull request #1 from NVIDIA/dev/wa..." | Re-trigger Greptile


## 1. Make the PDF tree reachable

ViDoRe v3 is split per-domain. The seven domains the harness recognises are:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 The list immediately below this sentence contains eight domain entries, but the prose says "seven." This mismatch will confuse readers trying to reconcile the domain count. The same section later (line 204) correctly states "three conditions × eight domains = 24 sessions," confirming eight is the right number.

Suggested change
ViDoRe v3 is split per-domain. The seven domains the harness recognises are:
ViDoRe v3 is split per-domain. The eight domains the harness recognises are:
Prompt To Fix With AI
This is a comment left during a code review.
Path: nemo_retriever/src/nemo_retriever/skill_eval/README.md
Line: 56

Comment:
The list immediately below this sentence contains eight domain entries, but the prose says "seven." This mismatch will confuse readers trying to reconcile the domain count. The same section later (line 204) correctly states "three conditions × eight domains = 24 sessions," confirming eight is the right number.

```suggestion
ViDoRe v3 is split per-domain. The eight domains the harness recognises are:
```

How can I resolve this? If you propose a fix, please make it concise.


**`config 'pdf_dirs' is missing an entry for domain '<X>'`** — your manifest contains a `domain` value that has no key in `pdf_dirs`. Either add the key, or use `--domains` to skip that subset.

**`PDF directory '…' for domain '…' does not exist or is not a directory`** — the value under `pdf_dirs.<domain>` was unset (`~` expansion failed, typo, etc.). Resolve the path manually with `ls "$PATH"` and update the config.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 $PATH is the shell variable for executable search directories, not for the PDF directory path. Running ls "$PATH" would expand to something like ls "/usr/local/bin:/usr/bin:/bin" and error or list unrelated directories. The intent is to inspect the configured PDF directory from the YAML config.

Suggested change
**`PDF directory '…' for domain '…' does not exist or is not a directory`** — the value under `pdf_dirs.<domain>` was unset (`~` expansion failed, typo, etc.). Resolve the path manually with `ls "$PATH"` and update the config.
**`PDF directory '…' for domain '…' does not exist or is not a directory`** — the value under `pdf_dirs.<domain>` was unset (`~` expansion failed, typo, etc.). Resolve the path manually with `ls "/your/configured/pdf_dirs/path"` and update the config.
Prompt To Fix With AI
This is a comment left during a code review.
Path: nemo_retriever/src/nemo_retriever/skill_eval/README.md
Line: 290

Comment:
`$PATH` is the shell variable for executable search directories, not for the PDF directory path. Running `ls "$PATH"` would expand to something like `ls "/usr/local/bin:/usr/bin:/bin"` and error or list unrelated directories. The intent is to inspect the configured PDF directory from the YAML config.

```suggestion
**`PDF directory '…' for domain '…' does not exist or is not a directory`** — the value under `pdf_dirs.<domain>` was unset (`~` expansion failed, typo, etc.). Resolve the path manually with `ls "/your/configured/pdf_dirs/path"` and update the config.
```

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +136 to +137
# Absolute path to your agent-eval manifest (JSON list).
eval_manifest_path: ~/datasets/vidore_v3/agent_eval_manifest.json
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 ~/ paths in YAML example won't expand

The comment on line 136 says "Absolute path to your agent-eval manifest," but ~/datasets/... is not an absolute path — ~ is a shell shorthand that standard YAML loaders (PyYAML's safe_load) pass through verbatim. Unless runner.py explicitly calls os.path.expanduser() on every path value after loading, the runner will look for a literal directory named ~ and fail. The troubleshooting section (line 291) already hints at this: "the value under pdf_dirs.<domain> was unset (~ expansion failed …)" — but users who hit this error after following the example config will find it confusing. The example should use a real absolute path (e.g. /home/user/datasets/vidore_v3/agent_eval_manifest.json) or add a note that ~ must be pre-expanded before writing it into the YAML.

Prompt To Fix With AI
This is a comment left during a code review.
Path: nemo_retriever/src/nemo_retriever/skill_eval/README.md
Line: 136-137

Comment:
**`~/` paths in YAML example won't expand**

The comment on line 136 says "Absolute path to your agent-eval manifest," but `~/datasets/...` is not an absolute path — `~` is a shell shorthand that standard YAML loaders (PyYAML's `safe_load`) pass through verbatim. Unless `runner.py` explicitly calls `os.path.expanduser()` on every path value after loading, the runner will look for a literal directory named `~` and fail. The troubleshooting section (line 291) already hints at this: "the value under `pdf_dirs.<domain>` was unset (`~` expansion failed …)" — but users who hit this error after following the example config will find it confusing. The example should use a real absolute path (e.g. `/home/user/datasets/vidore_v3/agent_eval_manifest.json`) or add a note that `~` must be pre-expanded before writing it into the YAML.

How can I resolve this? If you propose a fix, please make it concise.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants