-
Notifications
You must be signed in to change notification settings - Fork 322
26.05 into main folding pr #2175
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
a452eb6
6c2fd4e
542e995
6e47222
96c11fc
103c271
0dea097
9190fe5
722d553
4d698ad
881af18
6c5bb78
a14b5b3
19af4bc
4bfcb40
dfd6b38
b3029da
2b9880d
63dfa90
d25eb24
40e9992
a99479e
794fe97
fc53bed
cafad31
6309bdf
ca3d676
2005314
e63cf74
891f1f6
cb75050
ac41c62
52281d8
a4ab4e3
2edab62
951e0b5
f4e50c4
2df85c5
2271755
db2c87d
984140b
8856a87
e254e08
08de78e
9da3fb2
47715ea
72a6b61
2853421
2be38bd
0a6cb70
7fe6593
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
|
|
@@ -52,7 +52,7 @@ dependencies = [ | |||||
| # HTTP clients | ||||||
| "httpx>=0.27.0", | ||||||
| "requests>=2.32.5", | ||||||
| "urllib3>=2.7.0", | ||||||
| "urllib3==2.7.0", | ||||||
| # Utilities | ||||||
| "pydantic>=2.8.0", | ||||||
| "rich>=13.7.0", | ||||||
|
|
@@ -65,9 +65,9 @@ dependencies = [ | |||||
| # Document parsing and NIM client libs | ||||||
| "pypdfium2==4.30.0", | ||||||
| "pillow==12.2.0", | ||||||
| "nltk>=3.9.4", | ||||||
| "nltk==3.9.4", | ||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
Rule Used: Dependencies must be declared in the appropriate p... (source) Prompt To Fix With AIThis is a comment left during a code review.
Path: nemo_retriever/pyproject.toml
Line: 68
Comment:
Same concern as `urllib3`: `nltk` is not a critical (CUDA/PyTorch) dependency, so an exact pin may create resolution conflicts in environments where another package requires a different `nltk` patch version.
```suggestion
"nltk>=3.9.4",
```
**Rule Used:** Dependencies must be declared in the appropriate p... ([source](https://app.greptile.com/review/custom-context?memory=dependency-management-uv))
How can I resolve this? If you propose a fix, please make it concise. |
||||||
| "markitdown", | ||||||
| "langchain-nvidia-ai-endpoints>=0.3.0", | ||||||
| "langchain-nvidia-ai-endpoints>=1.4.0", | ||||||
| # Default VDB solution | ||||||
| "lancedb", | ||||||
| # gRPC client for Parakeet/Riva ASR. Required for ASRCPUActor when it | ||||||
|
|
@@ -123,11 +123,10 @@ local = [ | |||||
| "scikit-learn>=1.6.0", | ||||||
| "timm==1.0.22", | ||||||
| "albumentations==2.0.8", | ||||||
| "nemotron-page-elements-v3>=0.dev0", | ||||||
| "nemotron-graphic-elements-v1>=0.dev0", | ||||||
| "nemotron-table-structure-v1>=0.dev0", | ||||||
| # Accept the 2.0.0 stable release and newer OCR dev/final trains. | ||||||
| "nemotron-ocr>=2.0.0.dev0; sys_platform == 'linux' and (platform_machine == 'x86_64' or platform_machine == 'aarch64')", | ||||||
| "nemotron-page-elements-v3==3.0.1", | ||||||
| "nemotron-graphic-elements-v1==1.0.0", | ||||||
| "nemotron-table-structure-v1==1.0.0", | ||||||
| "nemotron-ocr>=2.0.0,<3; sys_platform == 'linux' and (platform_machine == 'x86_64' or platform_machine == 'aarch64')", | ||||||
| "nvidia-ml-py", | ||||||
| "apscheduler>=3.10", | ||||||
| "psutil>=5.9.0", | ||||||
|
|
@@ -165,7 +164,7 @@ tabular = [ | |||||
| "duckdb>=1.2.0", | ||||||
| "duckdb-engine>=0.13.0", | ||||||
| "neo4j>=5.0", | ||||||
| "langgraph>=1.1.0a2", | ||||||
| "langgraph>=1.2.0", | ||||||
| ] | ||||||
|
|
||||||
| # BEIR benchmarking and evaluation tools (not needed for production use). | ||||||
|
|
@@ -181,7 +180,7 @@ benchmarks = [ | |||||
| # or construct an ``LLMJudge`` / ``LiteLLMClient`` directly. Powers both the | ||||||
| # live-RAG SDK and the batch evaluation framework. | ||||||
| llm = [ | ||||||
| "litellm>=1.86.0rc1", | ||||||
| "litellm>=1.86.0,<2", | ||||||
| ] | ||||||
|
|
||||||
| dev = [ | ||||||
|
|
@@ -202,10 +201,6 @@ retriever-harness = "nemo_retriever.harness:main" | |||||
| version = {attr = "nemo_retriever.version.get_build_version"} | ||||||
|
|
||||||
| [tool.uv.sources] | ||||||
| nemotron-page-elements-v3 = { index = "test-pypi" } | ||||||
| nemotron-graphic-elements-v1 = { index = "test-pypi" } | ||||||
| nemotron-table-structure-v1 = { index = "test-pypi" } | ||||||
| nemotron-ocr = { index = "test-pypi" } | ||||||
| # On Linux, resolve torch/torchvision from the CUDA wheel index. | ||||||
| # On Mac, fall through to PyPI to get CPU wheels. | ||||||
| torch = [ | ||||||
|
|
||||||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -16,7 +16,7 @@ | |
| from nemo_retriever.params import ExtractParams, EmbedParams | ||
|
|
||
| result_ds = ( | ||
| GraphIngestor(run_mode="batch") | ||
| GraphIngestor(run_mode="inprocess") | ||
| .files(["/data/*.pdf"]) | ||
| .extract(ExtractParams(method="pdfium")) | ||
| .embed(EmbedParams(model_name="nvidia/llama-nemotron-embed-1b-v2")) | ||
|
|
@@ -387,8 +387,8 @@ class GraphIngestor(ingestor): | |
| Parameters | ||
| ---------- | ||
| run_mode | ||
| ``"batch"`` (Ray Data, default) or ``"inprocess"`` (single-process | ||
| pandas). | ||
| ``"inprocess"`` (single-process pandas, default) or ``"batch"`` (Ray | ||
| Data). | ||
| ray_address | ||
| Ray cluster address. ``None`` starts a local cluster. | ||
| batch_size | ||
|
|
@@ -415,7 +415,7 @@ class GraphIngestor(ingestor): | |
| def __init__( | ||
| self, | ||
| *, | ||
| run_mode: str = "batch", | ||
| run_mode: str = "inprocess", | ||
|
Comment on lines
415
to
+418
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Rule Used: Changes to public API surfaces (FastAPI endpoints,... (source) Prompt To Fix With AIThis is a comment left during a code review.
Path: nemo_retriever/src/nemo_retriever/graph_ingestor.py
Line: 415-418
Comment:
**Default `run_mode` changed from `"batch"` to `"inprocess"` without a deprecation cycle.** Any caller that relied on `GraphIngestor()` (or `ingest_documents()` / `resolve_ingest_plan()`) defaulting to Ray Data scale-out will now silently execute single-process pandas instead. This includes the `retriever pipeline run` CLI, the harness, and every programmatic caller that omitted `run_mode`. The `api-backward-compatibility` rule requires a deprecation warning or migration note for changes that alter observable default behavior. A comment in the docstring noting the version where the default changed would satisfy the intent of that rule.
**Rule Used:** Changes to public API surfaces (FastAPI endpoints,... ([source](https://app.greptile.com/review/custom-context?memory=api-backward-compatibility))
How can I resolve this? If you propose a fix, please make it concise. |
||
| documents: Optional[List[str]] = None, | ||
| ray_address: Optional[str] = None, | ||
| ray_log_to_driver: bool = True, | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
urllib3andnltkare overly restrictive for non-critical dependencies. The project'sdependency-management-uvrule specifies that only critical packages (PyTorch, CUDA-related) should be pinned exactly; others should use>=x.y.zrange specifiers. Exact pins can create unsolvable resolution conflicts when these packages are installed alongside other libraries that also depend on them.Rule Used: Dependencies must be declared in the appropriate p... (source)
Prompt To Fix With AI