From d051177e0e607ec6b210d2630181792bced0601e Mon Sep 17 00:00:00 2001 From: sosahi Date: Fri, 29 May 2026 09:25:49 -0700 Subject: [PATCH 01/10] removed duplicate --- skills/nemo-retriever/SKILL.md | 8 -------- 1 file changed, 8 deletions(-) diff --git a/skills/nemo-retriever/SKILL.md b/skills/nemo-retriever/SKILL.md index 63a5aa8a4..0dc6e7056 100644 --- a/skills/nemo-retriever/SKILL.md +++ b/skills/nemo-retriever/SKILL.md @@ -5,14 +5,6 @@ license: Apache-2.0 allowed-tools: Bash Write Read --- -| name | nemo-retriever | -| :------------ | :-- | -| description | Use when the user wants to search, index, or answer questions over a folder of PDFs (or other documents) — including building a RAG / search index over PDFs, looking up information across many PDFs, or running the `retriever` CLI (ingest, query, pipeline, recall, eval, etc.). | -| license | Apache-2.0 | -| compatibility | Designed for Claude Code, OpenCode, Codex, and Agent Skills-compatible tools. Requires Git, network access to GitHub | -| metadata | | -| allowed-tools | Bash Write Read | - # nemo-retriever The `retriever` CLI indexes a folder of PDFs into LanceDB (`retriever ingest`) and serves vector search over it (`retriever query`). For any task about searching/answering questions across a folder of PDFs, use this CLI — do not write a custom RAG. From 499943ab1f8f2bf8e705262fdbe7f706e5e26406 Mon Sep 17 00:00:00 2001 From: sosahi Date: Fri, 29 May 2026 09:51:08 -0700 Subject: [PATCH 02/10] incorporate Janisha's feedback --- skills/nemo-retriever/SKILL.md | 4 ++-- skills/nemo-retriever/references/query.md | 2 ++ 2 files changed, 4 insertions(+), 2 deletions(-) diff --git a/skills/nemo-retriever/SKILL.md b/skills/nemo-retriever/SKILL.md index 0dc6e7056..4f1932bbd 100644 --- a/skills/nemo-retriever/SKILL.md +++ b/skills/nemo-retriever/SKILL.md @@ -18,7 +18,7 @@ If `command -v retriever` returns nothing, follow `references/install.md` to ins | Turn type | Read this once | Then execute | | :--- | :--- | :--- | | **Setup turn** (first turn — `./lancedb/nv-ingest.lance` doesn't exist) | `references/setup.md` | Build the index | -| **Query turn** (every subsequent turn — user asks a question) | `references/query.md` | One `retriever query` call, then `Write` `./output.json` | +| **Query turn** (every subsequent turn — user asks a question) | `references/query.md` | One `retriever query` call, then `Write` `./output.json` **only if** the prompt mentions a judge, benchmark, output schema, or `output.json`; otherwise answer in chat | | Anything errored or returned empty | `references/pitfalls.md` | Apply the named recovery; do not improvise | For the full `retriever ingest` / `retriever query` CLI specs, see `references/cli/ingest.md` and `references/cli/query.md`. You do not need these for routine turns — `/bin/retriever --help` is faster. @@ -26,7 +26,7 @@ For the full `retriever ingest` / `retriever query` CLI specs, see `references/c ## Hard limits (apply to every turn) - **Setup turn**: build the index in one shell command (see `references/setup.md`). STOP after the index lands. -- **Query turn**: at most **2 Bash calls** — 1 `retriever query`, +1 optional targeted text-extract per `references/query.md`. Then `Write` `./output.json` and STOP. +- **Query turn**: at most **2 Bash calls** — 1 `retriever query`, +1 optional targeted text-extract per `references/query.md`. Then `Write` `./output.json` and STOP. **General-use exception:** if the prompt doesn't mention a judge, benchmark, output schema, or `output.json`, skip the `Write` step and answer in chat — all other discipline (2-Bash-call budget, no-narration, chart/image hedging) still applies. - **No narration between tool calls.** Tokens you emit between calls become input + cached input for every later turn — quadratic cost. Go straight from reading the summary to writing the JSON file. - **Banned**: `TodoWrite`, Glob, Grep, `Read` of whole PDFs, re-running setup, spawning subagents, speculative "confirmation" calls. diff --git a/skills/nemo-retriever/references/query.md b/skills/nemo-retriever/references/query.md index fc98e08c5..990bbae00 100644 --- a/skills/nemo-retriever/references/query.md +++ b/skills/nemo-retriever/references/query.md @@ -1,5 +1,7 @@ # Query turn — the WHOLE workflow +**General-use callout:** if the prompt doesn't mention a judge, benchmark, output schema, or `output.json`, skip the `Write` `./output.json` step and answer in chat. All other discipline (2-Bash-call budget, no-narration, chart/image hedging) still applies. + ## Filename fast path — try BEFORE `retriever query` If the user's question literally contains a PDF basename from `./pdfs/` (stem ≥6 chars, with or without `.pdf`, case-insensitive), skip semantic search. Direct pdfium extraction on the named file is faster and avoids semantic-search misses — the right doc is given, and pages rank by query-token overlap. From 2a054b93f805d10ab08a8b51f1ac583e763dd256 Mon Sep 17 00:00:00 2001 From: sosahi Date: Fri, 29 May 2026 10:09:33 -0700 Subject: [PATCH 03/10] update description --- skills/nemo-retriever/SKILL.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/skills/nemo-retriever/SKILL.md b/skills/nemo-retriever/SKILL.md index 4f1932bbd..735867600 100644 --- a/skills/nemo-retriever/SKILL.md +++ b/skills/nemo-retriever/SKILL.md @@ -1,6 +1,6 @@ --- name: nemo-retriever -description: Use when the user wants to search, index, or answer questions over a folder of PDFs (or other documents) — including building a RAG / search index over PDFs, looking up information across many PDFs, or running the `retriever` CLI (ingest, query, pipeline, recall, eval, etc.). +description: Use when the user wants to search, query, extract, transcribe, describe, quote, filter, or aggregate across documents — PDFs, scanned forms / images (`.jpg` `.png` `.tiff`), Office (`.docx` `.pptx`), text (`.html` `.txt`), audio (`.mp3` `.wav` `.m4a`), or video (`.mp4` `.mov`). Prefer this over native Read / Grep for multi-file or non-PDF corpora. Not for: editing files, web browsing, single-file plain-text lookups, fine-tuning. license: Apache-2.0 allowed-tools: Bash Write Read --- From fb2379e7e9480dc092fafa6f1fc7e51c5bd8d8b7 Mon Sep 17 00:00:00 2001 From: sosahi Date: Fri, 29 May 2026 10:38:54 -0700 Subject: [PATCH 04/10] updated per Janisha's request --- skills/nemo-retriever/SKILL.md | 2 +- skills/nemo-retriever/references/query.md | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/skills/nemo-retriever/SKILL.md b/skills/nemo-retriever/SKILL.md index 410fe4801..2f465483d 100644 --- a/skills/nemo-retriever/SKILL.md +++ b/skills/nemo-retriever/SKILL.md @@ -30,7 +30,7 @@ Before ingesting a mixed folder, inventory extensions (`find -name '*.*' | ## Hard limits (apply to every turn) - **Setup turn**: build the index in one shell command (see `references/setup.md`). STOP after the index lands. -- **Query turn**: at most **2 Bash calls** — 1 `retriever query`, +1 optional targeted text-extract per `references/query.md`. Then `Write` `./output.json` and STOP. **General-use exception:** if the prompt doesn't mention a judge, benchmark, output schema, or `output.json`, skip the `Write` step and answer in chat — all other discipline (2-Bash-call budget, no-narration, chart/image hedging) still applies. +- **Query turn**: at most **2 Bash calls** — 1 `retriever query`, +1 optional targeted text-extract per `references/query.md`. Then `Write` `./output.json` (eval-harness only) or answer in chat (general use) and STOP. - **No narration between tool calls.** Tokens you emit between calls become input + cached input for every later turn — quadratic cost. Go straight from reading the summary to writing the JSON file. - **Banned**: `TodoWrite`, Glob, Grep, `Read` of whole PDFs, re-running setup, spawning subagents, speculative "confirmation" calls. diff --git a/skills/nemo-retriever/references/query.md b/skills/nemo-retriever/references/query.md index e7eaf30ef..728b371b2 100644 --- a/skills/nemo-retriever/references/query.md +++ b/skills/nemo-retriever/references/query.md @@ -1,6 +1,6 @@ # Query turn — the WHOLE workflow -**General-use callout:** if the prompt doesn't mention a judge, benchmark, output schema, or `output.json`, skip the `Write` `./output.json` step and answer in chat. All other discipline (2-Bash-call budget, no-narration, chart/image hedging) still applies. +**General-use vs eval harness callout:** if the user prompt doesn't mention a judge, benchmark, output schema, or `output.json`, then just run the `retriever query` call and answer in chat. ## Filename fast path — try BEFORE `retriever query` From 6b1cb2e4599b431694289db88aaf49923ccd0b8d Mon Sep 17 00:00:00 2001 From: sosahi Date: Fri, 29 May 2026 11:49:52 -0700 Subject: [PATCH 05/10] more updates from janisha --- skills/nemo-retriever/SKILL.md | 4 ++-- skills/nemo-retriever/references/query.md | 8 +------- 2 files changed, 3 insertions(+), 9 deletions(-) diff --git a/skills/nemo-retriever/SKILL.md b/skills/nemo-retriever/SKILL.md index 2f465483d..dcbf0d9ac 100644 --- a/skills/nemo-retriever/SKILL.md +++ b/skills/nemo-retriever/SKILL.md @@ -20,7 +20,7 @@ If `command -v retriever` returns nothing, follow `references/install.md` to ins | Turn type | Read this once | Then execute | | :--- | :--- | :--- | | **Setup turn** (first turn — `./lancedb/nv-ingest.lance` doesn't exist) | `references/setup.md` | Build the index | -| **Query turn** (every subsequent turn — user asks a question) | `references/query.md` | One `retriever query` call, then `Write` `./output.json` **only if** the prompt mentions a judge, benchmark, output schema, or `output.json`; otherwise answer in chat | +| **Query turn** (every subsequent turn — user asks a question) | `references/query.md` | One `retriever query` call | | Anything errored or returned empty | `references/pitfalls.md` | Apply the named recovery; do not improvise | For the full `retriever ingest` / `retriever query` CLI specs, see `references/cli/ingest.md` and `references/cli/query.md`. You do not need these for routine turns — `/bin/retriever --help` is faster. @@ -30,7 +30,7 @@ Before ingesting a mixed folder, inventory extensions (`find -name '*.*' | ## Hard limits (apply to every turn) - **Setup turn**: build the index in one shell command (see `references/setup.md`). STOP after the index lands. -- **Query turn**: at most **2 Bash calls** — 1 `retriever query`, +1 optional targeted text-extract per `references/query.md`. Then `Write` `./output.json` (eval-harness only) or answer in chat (general use) and STOP. +- **Query turn**: at most **2 Bash calls** — 1 `retriever query`, +1 optional targeted text-extract per `references/query.md`. - **No narration between tool calls.** Tokens you emit between calls become input + cached input for every later turn — quadratic cost. Go straight from reading the summary to writing the JSON file. - **Banned**: `TodoWrite`, Glob, Grep, `Read` of whole PDFs, re-running setup, spawning subagents, speculative "confirmation" calls. diff --git a/skills/nemo-retriever/references/query.md b/skills/nemo-retriever/references/query.md index 728b371b2..4339e9f20 100644 --- a/skills/nemo-retriever/references/query.md +++ b/skills/nemo-retriever/references/query.md @@ -1,9 +1,5 @@ # Query turn — the WHOLE workflow -**General-use vs eval harness callout:** if the user prompt doesn't mention a judge, benchmark, output schema, or `output.json`, then just run the `retriever query` call and answer in chat. - -## Filename fast path — try BEFORE `retriever query` - ```bash /bin/retriever query "" --top-k 10 --embed-model-name nvidia/llama-nemotron-embed-1b-v2 --rerank \ @@ -15,7 +11,7 @@ Run that **exactly** as a single pipeline — do not split it into `HITS=$(...)` That's your FIRST tool call on every query turn. Do not Read, Glob, Grep, or list PDFs before this — those duplicate what `retriever query` already did. -**No narration between tool calls.** Do not write "Let me search…", "I'll now analyze…", "The retriever returned…", or any other commentary. Every assistant token you emit between the `retriever query` Bash call and the `Write` of `./output.json` becomes input tokens (and cached input tokens) for every subsequent turn in this session — quadratic cost. Go straight from reading the summary to writing the JSON file. The only assistant text in a query turn should be the tool calls themselves. +**No narration between tool calls.** Do not write "Let me search…", "I'll now analyze…", "The retriever returned…", or any other commentary. Every assistant token you emit with the `retriever query` Bash call becomes input tokens (and cached input tokens) for every subsequent turn in this session — quadratic cost. Go straight from reading the summary to writing the JSON file. The only assistant text in a query turn should be the tool calls themselves. Each hit has: `text`, `pdf_basename`, `page_number` (int, **1-indexed**: the first page of a PDF is page `1`), `pdf_page` (string composite key `"_"` — not a number, don't use it as one), `_distance`, and `metadata` (JSON with `type` ∈ `text|table|chart|image`). @@ -53,8 +49,6 @@ If a question asks for an exact percentage or a directional claim **and the evid When both a chart hit and a text hit cover the same fact, always prefer the text hit's number. -After writing `./output.json`, STOP. No print, no summary, no further tool calls. - ## Non-semantic operations (use these, don't fall back to native tools) **Page filter** — "what's on page N of doc.pdf" → filter LanceDB directly, no `Read`: From b43a935f64e88915bac0c66861f0a1766fea5e8e Mon Sep 17 00:00:00 2001 From: sosahi Date: Fri, 29 May 2026 12:34:26 -0700 Subject: [PATCH 06/10] all the remaining updates from Janisha --- skills/nemo-retriever/SKILL.md | 2 +- skills/nemo-retriever/references/query.md | 3 ++- skills/nemo-retriever/references/troubleshooting.md | 2 +- 3 files changed, 4 insertions(+), 3 deletions(-) diff --git a/skills/nemo-retriever/SKILL.md b/skills/nemo-retriever/SKILL.md index dcbf0d9ac..d4b25e792 100644 --- a/skills/nemo-retriever/SKILL.md +++ b/skills/nemo-retriever/SKILL.md @@ -30,7 +30,7 @@ Before ingesting a mixed folder, inventory extensions (`find -name '*.*' | ## Hard limits (apply to every turn) - **Setup turn**: build the index in one shell command (see `references/setup.md`). STOP after the index lands. -- **Query turn**: at most **2 Bash calls** — 1 `retriever query`, +1 optional targeted text-extract per `references/query.md`. +- **Query turn**: at most **2 Bash calls** — 1 `retriever query`, +1 optional targeted text-extract per `references/query.md`. Reply and then STOP. - **No narration between tool calls.** Tokens you emit between calls become input + cached input for every later turn — quadratic cost. Go straight from reading the summary to writing the JSON file. - **Banned**: `TodoWrite`, Glob, Grep, `Read` of whole PDFs, re-running setup, spawning subagents, speculative "confirmation" calls. diff --git a/skills/nemo-retriever/references/query.md b/skills/nemo-retriever/references/query.md index 4339e9f20..b42cadae5 100644 --- a/skills/nemo-retriever/references/query.md +++ b/skills/nemo-retriever/references/query.md @@ -27,7 +27,7 @@ It scans the LanceDB table the retriever already built — no PDF re-extraction. Don't reach for `pdftotext`, `pdftohtml`, or `pdfgrep` — they're system tools that aren't guaranteed installed on the user's machine. The retriever venv bundles pdfium and `lancedb`; `grep_corpus.py` and `retriever pdf stage page-elements --method pdfium` cover the same use cases without that dependency. -## Write `./output.json` directly from the hits +## Compose your reply from the hits - `final_answer`: synthesize from the top hits' `text`. Include the exact number / name / date / row / column the question asks for, plus the source PDF and 0-indexed page. One paragraph. No restating the question, no hedging caveats. If the chunks talk *around* the fact but don't state it, run ONE `/bin/retriever pdf stage page-elements ./pdfs --method pdfium --json-output-dir /tmp/pdf_text --compact-json` and `Read` `/tmp/pdf_text/.pdf.pdf_extraction.json` for the rank-1 page (or rank-2 if rank-1 is metadata) — that almost always surfaces the exact figure. Then synthesize. **If after both calls the asked-for fact still isn't in the evidence, write `final_answer` that says so explicitly** — e.g. "The retrieved pages do not state [X] for [entity]; the closest content is [Y]." Do NOT invent, extrapolate, or generate plausible-sounding content from adjacent material. A confidently-wrong answer scores worse than an honest "not in the retrieved pages". - `ranked_retrieved`: one entry per hit in the order `retriever query` returned: `{"doc_id": "", "page_number": , "rank": }`. Up to 10. Duplicate `(doc, page)` is fine. **Indexing:** the retriever's `page_number` is 1-indexed. If the task's output schema says 0-indexed (e.g. "first page is page 0"), emit `hit.page_number - 1`; if the task says 1-indexed or doesn't specify, emit `hit.page_number` as-is. @@ -48,6 +48,7 @@ If a question asks for an exact percentage or a directional claim **and the evid 3. If prose doesn't mention it, **quote the chart transcription verbatim with an explicit hedge in `final_answer`**: "The chart on page N indicates [verbatim phrase] (chart-derived, not verified against prose)." Do NOT restate the chart's number as a confident fact. When both a chart hit and a text hit cover the same fact, always prefer the text hit's number. +After your reply, STOP. No print, no summary, no further tool calls. ## Non-semantic operations (use these, don't fall back to native tools) diff --git a/skills/nemo-retriever/references/troubleshooting.md b/skills/nemo-retriever/references/troubleshooting.md index 1079a9c28..cdb399bff 100644 --- a/skills/nemo-retriever/references/troubleshooting.md +++ b/skills/nemo-retriever/references/troubleshooting.md @@ -36,7 +36,7 @@ If unsupported extensions appear, name them in your reply and ask the user wheth ## You ran more than 2 Bash calls on a query turn -Budget violation. Stop, write `final_answer` from what you have, write `./output.json`, end the turn. Long turns cost ~5× a disciplined turn and usually still produce the wrong answer. +Budget violation. Stop, write `final_answer` from what you have, end the turn. Long turns cost ~5× a disciplined turn and usually still produce the wrong answer. ## Query-turn cost discipline (recap) From 2c22e39afcfc971e92b74322e80f9a77212f8483 Mon Sep 17 00:00:00 2001 From: sosahi Date: Fri, 29 May 2026 12:36:14 -0700 Subject: [PATCH 07/10] fix pitfall to troubleshoot --- skills/nemo-retriever/SKILL.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/skills/nemo-retriever/SKILL.md b/skills/nemo-retriever/SKILL.md index d4b25e792..0947801f7 100644 --- a/skills/nemo-retriever/SKILL.md +++ b/skills/nemo-retriever/SKILL.md @@ -21,7 +21,7 @@ If `command -v retriever` returns nothing, follow `references/install.md` to ins | :--- | :--- | :--- | | **Setup turn** (first turn — `./lancedb/nv-ingest.lance` doesn't exist) | `references/setup.md` | Build the index | | **Query turn** (every subsequent turn — user asks a question) | `references/query.md` | One `retriever query` call | -| Anything errored or returned empty | `references/pitfalls.md` | Apply the named recovery; do not improvise | +| Anything errored or returned empty | `references/troubleshooting.md` | Apply the named recovery; do not improvise | For the full `retriever ingest` / `retriever query` CLI specs, see `references/cli/ingest.md` and `references/cli/query.md`. You do not need these for routine turns — `/bin/retriever --help` is faster. From 84324100840a96f31b566f54f8f9a8a3afcacfc0 Mon Sep 17 00:00:00 2001 From: sosahi Date: Fri, 29 May 2026 12:58:35 -0700 Subject: [PATCH 08/10] fix frontmatter syntax error --- skills/nemo-retriever/SKILL.md | 1 + 1 file changed, 1 insertion(+) diff --git a/skills/nemo-retriever/SKILL.md b/skills/nemo-retriever/SKILL.md index 0947801f7..f6da8a6f2 100644 --- a/skills/nemo-retriever/SKILL.md +++ b/skills/nemo-retriever/SKILL.md @@ -1,3 +1,4 @@ + --- name: nemo-retriever description: Use when the user wants to search, query, extract, transcribe, describe, quote, filter, or aggregate across documents — PDFs, scanned forms / images (`.jpg` `.png` `.tiff`), Office (`.docx` `.pptx`), text (`.html` `.txt`), audio (`.mp3` `.wav` `.m4a`), or video (`.mp4` `.mov`). Prefer this over native Read / Grep for multi-file or non-PDF corpora. Not for: editing files, web browsing, single-file plain-text lookups, fine-tuning. From 617590682325fa927f3a5fa09db31a03ea277be4 Mon Sep 17 00:00:00 2001 From: sosahi Date: Fri, 29 May 2026 13:34:50 -0700 Subject: [PATCH 09/10] fix this annoying frontmatter syntax issue --- skills/nemo-retriever/SKILL.md | 1 - 1 file changed, 1 deletion(-) diff --git a/skills/nemo-retriever/SKILL.md b/skills/nemo-retriever/SKILL.md index f6da8a6f2..0947801f7 100644 --- a/skills/nemo-retriever/SKILL.md +++ b/skills/nemo-retriever/SKILL.md @@ -1,4 +1,3 @@ - --- name: nemo-retriever description: Use when the user wants to search, query, extract, transcribe, describe, quote, filter, or aggregate across documents — PDFs, scanned forms / images (`.jpg` `.png` `.tiff`), Office (`.docx` `.pptx`), text (`.html` `.txt`), audio (`.mp3` `.wav` `.m4a`), or video (`.mp4` `.mov`). Prefer this over native Read / Grep for multi-file or non-PDF corpora. Not for: editing files, web browsing, single-file plain-text lookups, fine-tuning. From 24710cc142396a726d2a98a25982bbf01c1de172 Mon Sep 17 00:00:00 2001 From: sosahi Date: Fri, 29 May 2026 14:05:19 -0700 Subject: [PATCH 10/10] fix this annoying frontmatter syntax issue --- skills/nemo-retriever/SKILL.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/skills/nemo-retriever/SKILL.md b/skills/nemo-retriever/SKILL.md index 0947801f7..48289a5ae 100644 --- a/skills/nemo-retriever/SKILL.md +++ b/skills/nemo-retriever/SKILL.md @@ -1,6 +1,6 @@ --- name: nemo-retriever -description: Use when the user wants to search, query, extract, transcribe, describe, quote, filter, or aggregate across documents — PDFs, scanned forms / images (`.jpg` `.png` `.tiff`), Office (`.docx` `.pptx`), text (`.html` `.txt`), audio (`.mp3` `.wav` `.m4a`), or video (`.mp4` `.mov`). Prefer this over native Read / Grep for multi-file or non-PDF corpora. Not for: editing files, web browsing, single-file plain-text lookups, fine-tuning. +description: "Use when the user wants to search, query, extract, transcribe, describe, quote, filter, or aggregate across documents — PDFs, scanned forms / images (`.jpg` `.png` `.tiff`), Office (`.docx` `.pptx`), text (`.html` `.txt`), audio (`.mp3` `.wav` `.m4a`), or video (`.mp4` `.mov`). Prefer this over native Read / Grep for multi-file or non-PDF corpora. Not for: editing files, web browsing, single-file plain-text lookups, fine-tuning." license: Apache-2.0 allowed-tools: Bash Write Read ---