cherry-picked skills from main#2177
Conversation
|
/nvskills-ci |
Greptile SummaryThis PR cherry-picks the
|
| Filename | Overview |
|---|---|
| skills/nemo-retriever/references/cli/ingest.md | Renamed and updated CLI reference; the Key Flags --table-name default (nemo-retriever) was not updated to match the new nv-ingest default stated in the narrative and all other skill docs. |
| skills/nemo-retriever/scripts/grep_corpus.py | New corpus-grep utility; previously flagged issues (SPDX header, wrong default table name nemo-retriever, full to_pandas() memory load) remain open. |
| skills/nemo-retriever/scripts/filename_fast_path.py | New query-turn fast-path script; previously flagged issues (SPDX header, unclosed file handle, no unit tests) remain open. |
| skills/nemo-retriever/SKILL.md | New top-level skill entrypoint; workflow table, hard limits, and reference pointers look consistent and well-structured. |
| skills/nemo-retriever/references/query.md | New query workflow reference; canonical pipeline, grep_corpus, chart/image caution, and non-semantic operations all consistently reference nv-ingest. |
Flowchart
%%{init: {'theme': 'neutral'}}%%
flowchart TD
A([Agent receives query]) --> B{nv-ingest.lance exists?}
B -- No --> C[Read references/setup.md]
C --> D{TOTAL_PAGES le 800?}
D -- Yes --> E[retriever ingest ./pdfs/]
D -- No --> F["retriever pipeline run --quiet"]
E & F --> G([STOP - index built])
B -- Yes --> H[Read references/query.md]
H --> I["retriever query --top-k 10 --rerank"]
I --> J{Hits sufficient?}
J -- Yes --> K([Synthesize final_answer - STOP])
J -- chart/image hit --> L["retriever pdf stage page-elements"]
J -- exact text needed --> M["grep_corpus.py regex"]
L & M --> K
I -- Empty --> N[Read references/troubleshooting.md]
N --> O[Pick best PDF from ./pdfs/]
O --> P["retriever pdf stage page-elements single PDF"]
P --> K
Prompt To Fix All With AI
Fix the following 1 code review issue. Work through them one at a time, proposing concise fixes.
---
### Issue 1 of 1
skills/nemo-retriever/references/cli/ingest.md:95
**`--table-name` default contradicts the rest of the skill docs**
The Key Flags table still documents the default as `nemo-retriever`, but this same file's canonical invocation header was updated to `lancedb/nv-ingest.lance`, and every other reference in the skill (`SKILL.md`, `references/query.md` page-filter and aggregate one-liners, and `setup.md`) consistently uses `nv-ingest`. An agent reading the Key Flags section to understand what table `retriever ingest` writes into will use the wrong table name in any subsequent `retriever query`, `grep_corpus.py`, or direct LanceDB call — causing all corpus searches to silently return nothing.
Reviews (2): Last reviewed commit: "fixed syntax" | Re-trigger Greptile
| @@ -0,0 +1,161 @@ | |||
| """Query-turn filename fast path for the nemo-retriever skill. | |||
There was a problem hiding this comment.
Both new Python files (filename_fast_path.py and grep_corpus.py) are missing the required SPDX license header. Per the repository rule, every Python file added in a PR must begin with:
# SPDX-FileCopyrightText: Copyright (c) 2026, NVIDIA CORPORATION & AFFILIATES.
# All rights reserved.
# SPDX-License-Identifier: Apache-2.0
This applies to both scripts/filename_fast_path.py and scripts/grep_corpus.py.
Rule Used: Python files added in this PR must include the SPD... (source)
Prompt To Fix With AI
This is a comment left during a code review.
Path: skills/nemo-retriever/scripts/filename_fast_path.py
Line: 1
Comment:
**Missing SPDX license header**
Both new Python files (`filename_fast_path.py` and `grep_corpus.py`) are missing the required SPDX license header. Per the repository rule, every Python file added in a PR must begin with:
```
# SPDX-FileCopyrightText: Copyright (c) 2026, NVIDIA CORPORATION & AFFILIATES.
# All rights reserved.
# SPDX-License-Identifier: Apache-2.0
```
This applies to both `scripts/filename_fast_path.py` and `scripts/grep_corpus.py`.
**Rule Used:** Python files added in this PR must include the SPD... ([source](https://app.greptile.com/review/custom-context?memory=spdx-license-header))
How can I resolve this? If you propose a fix, please make it concise.Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
| ap.add_argument("--max-hits", type=int, default=50) | ||
| ap.add_argument("--snippet-pad", type=int, default=60) | ||
| ap.add_argument("--lancedb-uri", default="./lancedb") | ||
| ap.add_argument("--table-name", default="nemo-retriever") |
There was a problem hiding this comment.
Change the default table name to match the one
retriever ingest actually writes into.
| ap.add_argument("--table-name", default="nemo-retriever") | |
| ap.add_argument("--table-name", default="nv-ingest") |
Prompt To Fix With AI
This is a comment left during a code review.
Path: skills/nemo-retriever/scripts/grep_corpus.py
Line: 37
Comment:
Change the default table name to match the one `retriever ingest` actually writes into.
```suggestion
ap.add_argument("--table-name", default="nv-ingest")
```
How can I resolve this? If you propose a fix, please make it concise.Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
| def page_records(sidecar: str) -> list[dict]: | ||
| data = json.load(open(sidecar)) |
There was a problem hiding this comment.
The file opened by
json.load(open(sidecar)) is never explicitly closed. Use a context manager instead.
| def page_records(sidecar: str) -> list[dict]: | |
| data = json.load(open(sidecar)) | |
| def page_records(sidecar: str) -> list[dict]: | |
| with open(sidecar) as fh: | |
| data = json.load(fh) |
Prompt To Fix With AI
This is a comment left during a code review.
Path: skills/nemo-retriever/scripts/filename_fast_path.py
Line: 94-95
Comment:
The file opened by `json.load(open(sidecar))` is never explicitly closed. Use a context manager instead.
```suggestion
def page_records(sidecar: str) -> list[dict]:
with open(sidecar) as fh:
data = json.load(fh)
```
How can I resolve this? If you propose a fix, please make it concise.|
/nvskills-ci |
1 similar comment
|
/nvskills-ci |
Description
Checklist