Skip to content

feat(files): migrate filesets package to NemoClient typed HTTP client#429

Open
matthewgrossman wants to merge 6 commits into
mainfrom
mgrossman/aircore-827-migrate-first-plugin-to-nemoclient-typed-http-client-files
Open

feat(files): migrate filesets package to NemoClient typed HTTP client#429
matthewgrossman wants to merge 6 commits into
mainfrom
mgrossman/aircore-827-migrate-first-plugin-to-nemoclient-typed-http-client-files

Conversation

@matthewgrossman

@matthewgrossman matthewgrossman commented Jun 24, 2026

Copy link
Copy Markdown
Contributor

Summary

Migrates packages/filesets/ (the Files/Filesets client layer) from the Stainless-generated NeMoPlatform SDK to the new NemoClient typed HTTP client introduced in #397. This is tracked by AIRCORE-827.

  • Shared typesFilesetPurpose, CacheStatus, request/response DTOs (FilesetOutput, FilesetFileOutput, CreateFilesetRequest, etc.) now live in nemo_platform_plugin.files.types as the single source of truth. The files service imports from here instead of defining its own parallel copies.
  • Typed endpoints — 9 endpoint functions (create_fileset, get_fileset, upload_file, download_file, etc.) defined using the @get/@post/@put/@patch/@delete + @abstractmethod pattern from feat(sdk): ParamSpec-based typed HTTP endpoints with method() convenience layer #397.
  • FilesetFileSystem migration — All self._sdk.files._upload_file()/_download_file()/_list_files()/_delete_file() calls replaced with self._client.send(endpoints.X(...)). Constructor takes client: AsyncNemoClient with sdk= backward compat for the vendored SDK path.
  • FilesResource migration — No longer inherits from Stainless BaseFilesResource. Standalone class backed by NemoClient. Adds FilesetsSubResource for CRUD operations (sdk.files.filesets.create/retrieve/update/list/delete).
  • Binary response ergonomicsNemoBinaryResponse.read() now does a plain httpx.request() (no streaming overhead). resp.stream() returns a context manager for chunk-by-chunk iteration. No more mandatory context manager for simple reads.

What stays the same

  • Public API of FilesResource (.download(), .upload(), .upload_content(), .download_content(), .list(), .delete(), .fsspec)
  • The vendored SDK copy still works via backward compat (FilesResource(NeMoPlatform) auto-converts)
  • All existing docstrings and examples preserved

Backward compatibility

The filesets package accepts both NemoClient (new) and NeMoPlatform (legacy) — auto-detected in constructors. The vendored SDK continues to pass NeMoPlatform via the existing tool.vendor-package.replaces hack; no changes needed to _client.py.

Test plan

  • 12 new endpoint unit tests (test_endpoints.py)
  • 746 nemo_platform_plugin tests pass
  • 560 files service tests pass (unit + integration)
  • 214 files e2e tests pass (test_files_sdk, test_fileset_filesystem, test_files_basic)
  • make lint-fix passes (excluding pre-existing web-sdk env issue)
  • make update-sdk / make vendor output is stable

🤖 Generated with Claude Code

Signed-off-by: Matthew Grossman <mgrossman@nvidia.com>
Signed-off-by: Matthew Grossman <mgrossman@nvidia.com>
…o-nemoclient-typed-http-client-files

Signed-off-by: Matthew Grossman <mgrossman@nvidia.com>
Signed-off-by: Matthew Grossman <mgrossman@nvidia.com>
Signed-off-by: Matthew Grossman <mgrossman@nvidia.com>
@matthewgrossman matthewgrossman requested review from a team as code owners June 24, 2026 01:28
@github-actions github-actions Bot added the feat label Jun 24, 2026
@coderabbitai

coderabbitai Bot commented Jun 24, 2026

Copy link
Copy Markdown

Review Change Stack

📝 Walkthrough

Walkthrough

Replaces the nemo_platform SDK dependency across the Files service with typed HTTP endpoint contracts and shared DTOs in nemo_platform_plugin. Streaming response wrappers are refactored to own their HTTP stream lifecycle. Server-side schemas, cache warming, filesystem, and resource classes are all migrated to the plugin client.

Changes

Files Service Plugin Migration

Layer / File(s) Summary
Shared types and endpoint contracts
packages/nemo_platform_plugin/src/nemo_platform_plugin/files/types.py, packages/nemo_platform_plugin/src/nemo_platform_plugin/files/endpoints.py, packages/nemo_platform_plugin/tests/files/test_endpoints.py
Defines FilesetPurpose, CacheStatus, Pydantic DTOs, query TypedDicts, and typed HTTP endpoint contracts for fileset CRUD and file operations; tests validate each endpoint's PreparedRequest shape.
Streaming response wrapper refactor
packages/nemo_platform_plugin/src/nemo_platform_plugin/client/response.py, packages/nemo_platform_plugin/src/nemo_platform_plugin/client/client.py
NemoBinaryResponse, NemoStreamResponse, and async counterparts are rewritten to accept httpx.Client/AsyncClient plus request_kwargs and expose a stream() context manager; NemoClient.send/AsyncNemoClient.send build kwargs dicts and pass them to the new constructors.
Server schema consolidation
services/core/files/src/nmp/core/files/api/v2/filesets/schemas.py, services/core/files/src/nmp/core/files/api/v2/filesets/endpoints.py, services/core/files/src/nmp/core/files/entities.py
Removes locally-defined response/request models; imports them from nemo_platform_plugin.files.types; replaces FilesetOutput.from_entity classmethod with standalone converter functions; server endpoints updated to call converters. FilesetPurpose sourced from plugin.
Cache warming concurrency limiter
services/core/files/src/nmp/core/files/app/cache.py
Introduces a process-wide anyio.CapacityLimiter for background cache warming; CacheStatus now imported from plugin; cache_file_directly and warm_fileset_cache acquire the limiter.
FilesetFileSystem plugin client migration
packages/filesets/src/filesets/filesystem/filesystem.py
__init__ gains client/sdk parameters with _client_from_sdk conversion; _workspace property replaces SDK workspace access; all backend calls switch from _sdk.files._* to self._client.send(endpoints.*(...)).
FilesResource/AsyncFilesResource migration
packages/filesets/src/filesets/resources.py
Both resource classes rewritten as standalone classes; FilesetsSubResource/AsyncFilesetsSubResource wrappers perform fileset CRUD via typed endpoints; upload/upload_content return FilesetOutput; list calls endpoints.list_files directly with client-side glob filtering; ListFilesResponse.data typed as list[FilesetFileOutput].
OpenAPI schema description
openapi/openapi.yaml, openapi/ga/openapi.yaml, openapi/ga/individual/platform.openapi.yaml
Adds description field to FilesetFileOutput schema in all three spec files.

Possibly related PRs

  • NVIDIA-NeMo/nemo-platform#370: Introduced the typed NemoClient/endpoint/response infrastructure that this PR extends and migrates the Files service onto.

Suggested reviewers

  • maxdubrinsky
🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 55.10% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed Title directly matches the primary change: migrating filesets package from old SDK to NemoClient typed HTTP client. Specific, concise, and clearly summarizes main objective.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch mgrossman/aircore-827-migrate-first-plugin-to-nemoclient-typed-http-client-files

Comment @coderabbitai help to get the list of available commands.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
packages/nemo_platform_plugin/src/nemo_platform_plugin/client/client.py (1)

146-151: 📐 Maintainability & Code Quality | 🟡 Minor | ⚡ Quick win

Docstring example no longer valid.

The new wrappers don't implement __enter__/__iter__; with client.send(...) as resp: for chunk in resp: will fail. Update to the resp.stream() API.

Proposed doc fix
-        For binary and streaming endpoints, the caller should use the
-        response as a context manager to ensure the connection is closed::
-
-            with client.send(endpoints.download(name="file.csv")) as resp:
-                for chunk in resp:
-                    f.write(chunk)
+        For binary and streaming endpoints, use ``stream()`` to ensure the
+        connection is closed::
+
+            with client.send(endpoints.download(name="file.csv")).stream() as resp:
+                for chunk in resp.iter_bytes():
+                    f.write(chunk)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/nemo_platform_plugin/src/nemo_platform_plugin/client/client.py`
around lines 146 - 151, The docstring example for the send method (or similar
client method) showing binary and streaming endpoint usage is outdated and no
longer matches the actual API. The example demonstrates using with
client.send(...) as resp: for chunk in resp: which relies on __enter__ and
__iter__ methods that the new wrappers no longer implement. Update the docstring
example to use the new resp.stream() API instead of the context manager and
direct iteration pattern, ensuring the example accurately reflects how the
current implementation should be used.
services/core/files/src/nmp/core/files/app/cache.py (1)

154-156: 🩺 Stability & Availability | 🟠 Major | ⚡ Quick win

Bound task creation, not just cache work.

The limiter is acquired inside cache_file_directly, so this still creates one parked task per file. Large filesets can exhaust memory before throttling helps.

Proposed fix
-    async with anyio.create_task_group() as tg:
-        for file_info in files:
-            tg.start_soon(cache_with_logging, file_info.path)
+    batch_size = max(1, int(_get_background_cache_limiter().total_tokens))
+    for start in range(0, len(files), batch_size):
+        async with anyio.create_task_group() as tg:
+            for file_info in files[start : start + batch_size]:
+                tg.start_soon(cache_with_logging, file_info.path)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@services/core/files/src/nmp/core/files/app/cache.py` around lines 154 - 156,
The task creation loop in the anyio task group creates one task for each file
upfront with tg.start_soon, and the limiter is only acquired inside the
cache_file_directly function, causing all tasks to be created and parked in
memory. To fix this, acquire the limiter before creating each task by using a
semaphore in the task creation loop itself. Wrap the cache_with_logging call in
a helper function that acquires the limiter first, then releases it after the
caching operation completes, so that only a bounded number of tasks are created
at any given time rather than creating all tasks upfront.
🧹 Nitpick comments (3)
packages/nemo_platform_plugin/src/nemo_platform_plugin/client/client.py (1)

159-179: 📐 Maintainability & Code Quality | 🔵 Trivial | 💤 Low value

Identical kwargs dict built in both branches. Hoist it once before the _is_binary check (same applies to the async send at lines 236-256).

Proposed refactor
-        if self._is_binary(request):
-            kwargs = {
-                "method": request.method,
-                "url": url,
-                "content": request.content,
-                "headers": req_headers,
-                "params": params,
-            }
-            return NemoBinaryResponse(self._http, kwargs, request)
-
-        if self._is_stream(request):
-            assert request.response_type is not None
-            kwargs = {
-                "method": request.method,
-                "url": url,
-                "content": request.content,
-                "headers": req_headers,
-                "params": params,
-            }
-            model_type = _get_stream_model_type(request.response_type)
-            return NemoStreamResponse(self._http, kwargs, model_type, request)
+        kwargs = {
+            "method": request.method,
+            "url": url,
+            "content": request.content,
+            "headers": req_headers,
+            "params": params,
+        }
+        if self._is_binary(request):
+            return NemoBinaryResponse(self._http, kwargs, request)
+
+        if self._is_stream(request):
+            assert request.response_type is not None
+            model_type = _get_stream_model_type(request.response_type)
+            return NemoStreamResponse(self._http, kwargs, model_type, request)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/nemo_platform_plugin/src/nemo_platform_plugin/client/client.py`
around lines 159 - 179, The kwargs dictionary containing method, url, content,
headers, and params is being constructed identically in both the _is_binary and
_is_stream conditional branches, causing unnecessary code duplication. Extract
this dictionary construction and assign it to kwargs before the _is_binary
conditional check, then reuse the same kwargs variable in both the
NemoBinaryResponse and NemoStreamResponse instantiations. Apply the same
refactoring pattern to the async send method where the identical duplication
occurs with _is_binary and _is_stream checks as well.
packages/nemo_platform_plugin/tests/files/test_endpoints.py (1)

98-115: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Add endpoint tests for streaming upload variants and partial download headers.

Current coverage checks bytes upload and full download only. Add cases for Iterable[bytes]/AsyncIterable[bytes] upload content and, if supported by request plumbing, Range header preparation for download.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/nemo_platform_plugin/tests/files/test_endpoints.py` around lines 98
- 115, The test file currently covers only bytes upload content and full file
download. Add additional test cases after the existing test_upload_file() and
test_download_file() functions to cover streaming upload variants and partial
download with Range headers. Create new test functions to verify that
endpoints.upload_file() correctly handles Iterable[bytes] and
AsyncIterable[bytes] content types (validating the method, path_params, content
type handling), and add a test for endpoints.download_file() that verifies Range
header preparation for partial downloads if the request plumbing supports it.
packages/filesets/src/filesets/resources.py (1)

604-617: 🚀 Performance & Scalability | 🔵 Trivial | ⚡ Quick win

Use the static glob prefix for server-side filtering.

data/*.jsonl currently sends no path, forcing a full-fileset scan before client filtering; with include_cache_status=True, cache status is computed for unrelated files too. Derive the non-magic directory prefix and pass it as path.

♻️ Suggested shape
-        api_path = None if has_magic(path) else (path or None)
+        api_path = _glob_static_prefix(path) if has_magic(path) else (path or None)
def _glob_static_prefix(path: str) -> str | None:
    if not has_magic(path):
        return path or None
    prefix = path.rsplit("/", 1)[0] if "/" in path else ""
    while prefix and has_magic(prefix):
        prefix = prefix.rsplit("/", 1)[0] if "/" in prefix else ""
    return f"{prefix.rstrip('/')}/" if prefix else None

Also applies to: 1097-1110

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/filesets/src/filesets/resources.py` around lines 604 - 617, The
current implementation in the code block around the list_files call sends no
path parameter for glob patterns, forcing full-fileset scans. Implement a helper
function to extract the static (non-magic) directory prefix from glob patterns,
then use this prefix as the api_path for server-side filtering. Replace the
simple has_magic check on the path variable with a call to a new function that
returns the static prefix by repeatedly removing path segments until a non-magic
prefix is found, then pass this prefix to the query_params instead of the full
path. Apply this same pattern to the other similar code block mentioned at lines
1097-1110 to ensure consistent server-side filtering across all list_files
calls.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@packages/filesets/src/filesets/filesystem/filesystem.py`:
- Around line 361-399: The `AsyncNemoClient` class stores the `http_client`
passed to it in the `_client_from_sdk` method but has no cleanup mechanism to
close the connection, causing resource leaks when instances are destroyed. Add
async cleanup methods such as `__aexit__` or `aclose` to the `AsyncNemoClient`
class to properly close the `http_client` when the instance is destroyed, or
alternatively, have `FilesetFileSystem` track and manage the lifecycle of the
`AsyncNemoClient` instances it creates by calling close on them when they are no
longer needed.
- Around line 612-626: The `send()` method does not raise exceptions on non-2xx
HTTP responses; only calling `.data()` on the response does. The `_rm_file`
method and `_pipe_file` method are discarding the response from `send()` without
calling `.data()`, which silently swallows errors like 403, 408, 507, or 500. To
fix this, capture the result of each `await self._client.send()` call and
immediately call `.data()` on it to ensure HTTP errors are surfaced before the
parent cache is invalidated. Apply this same fix pattern to the `_pipe_stream`
and `_put_file` methods as mentioned in the comment.

In `@packages/filesets/src/filesets/resources.py`:
- Around line 145-156: The fsspec property is creating an AsyncNemoClient but
only passing base_url and workspace parameters, which discards important HTTP
configuration from the original sync client self._client. To fix this, extract
the HTTP configuration (including auth headers, custom headers for service
principals/identity markers, and timeout settings) from self._client and pass
them to AsyncNemoClient either by extracting headers from
self._client._http.headers and passing them as the default_headers parameter, or
by following the existing pattern used in FilesetFileSystem._client_from_sdk to
properly construct an http_client parameter that preserves the full
configuration. This ensures the async client has the same authentication and
header setup as the original sync client.

In `@packages/nemo_platform_plugin/src/nemo_platform_plugin/files/types.py`:
- Around line 79-88: The NAME_PATTERN regex uses \w which matches Unicode word
characters, but the CreateFilesetRequest name field documentation explicitly
limits allowed characters to ASCII letters (a-z, A-Z), digits (0-9),
underscores, hyphens, and dots. Update the NAME_PATTERN regex definition to use
character classes that match only the documented ASCII character set instead of
the broader \w pattern, ensuring the regex restriction aligns with the field
description.

In `@services/core/files/src/nmp/core/files/api/v2/filesets/schemas.py`:
- Line 62: The file_url construction is directly interpolating file_info.path
without URL encoding, which breaks URLs containing special characters like
spaces, question marks, hashes, or percent signs. In the file_url assignment
where file_info.path is included in the f-string, apply URL encoding to the path
component using urllib.parse.quote() to ensure special characters are properly
escaped before being inserted into the URL path.

---

Outside diff comments:
In `@packages/nemo_platform_plugin/src/nemo_platform_plugin/client/client.py`:
- Around line 146-151: The docstring example for the send method (or similar
client method) showing binary and streaming endpoint usage is outdated and no
longer matches the actual API. The example demonstrates using with
client.send(...) as resp: for chunk in resp: which relies on __enter__ and
__iter__ methods that the new wrappers no longer implement. Update the docstring
example to use the new resp.stream() API instead of the context manager and
direct iteration pattern, ensuring the example accurately reflects how the
current implementation should be used.

In `@services/core/files/src/nmp/core/files/app/cache.py`:
- Around line 154-156: The task creation loop in the anyio task group creates
one task for each file upfront with tg.start_soon, and the limiter is only
acquired inside the cache_file_directly function, causing all tasks to be
created and parked in memory. To fix this, acquire the limiter before creating
each task by using a semaphore in the task creation loop itself. Wrap the
cache_with_logging call in a helper function that acquires the limiter first,
then releases it after the caching operation completes, so that only a bounded
number of tasks are created at any given time rather than creating all tasks
upfront.

---

Nitpick comments:
In `@packages/filesets/src/filesets/resources.py`:
- Around line 604-617: The current implementation in the code block around the
list_files call sends no path parameter for glob patterns, forcing full-fileset
scans. Implement a helper function to extract the static (non-magic) directory
prefix from glob patterns, then use this prefix as the api_path for server-side
filtering. Replace the simple has_magic check on the path variable with a call
to a new function that returns the static prefix by repeatedly removing path
segments until a non-magic prefix is found, then pass this prefix to the
query_params instead of the full path. Apply this same pattern to the other
similar code block mentioned at lines 1097-1110 to ensure consistent server-side
filtering across all list_files calls.

In `@packages/nemo_platform_plugin/src/nemo_platform_plugin/client/client.py`:
- Around line 159-179: The kwargs dictionary containing method, url, content,
headers, and params is being constructed identically in both the _is_binary and
_is_stream conditional branches, causing unnecessary code duplication. Extract
this dictionary construction and assign it to kwargs before the _is_binary
conditional check, then reuse the same kwargs variable in both the
NemoBinaryResponse and NemoStreamResponse instantiations. Apply the same
refactoring pattern to the async send method where the identical duplication
occurs with _is_binary and _is_stream checks as well.

In `@packages/nemo_platform_plugin/tests/files/test_endpoints.py`:
- Around line 98-115: The test file currently covers only bytes upload content
and full file download. Add additional test cases after the existing
test_upload_file() and test_download_file() functions to cover streaming upload
variants and partial download with Range headers. Create new test functions to
verify that endpoints.upload_file() correctly handles Iterable[bytes] and
AsyncIterable[bytes] content types (validating the method, path_params, content
type handling), and add a test for endpoints.download_file() that verifies Range
header preparation for partial downloads if the request plumbing supports it.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 8694569f-46e1-4fc9-98c1-41c895bbff96

📥 Commits

Reviewing files that changed from the base of the PR and between d8ea9e2 and 7e65bf1.

⛔ Files ignored due to path filters (4)
  • sdk/python/nemo-platform/.nmpcontext/openapi.yaml is excluded by !sdk/**
  • sdk/python/nemo-platform/src/nemo_platform/filesets/filesystem/filesystem.py is excluded by !sdk/**
  • sdk/python/nemo-platform/src/nemo_platform/filesets/resources.py is excluded by !sdk/**
  • sdk/python/nemo-platform/src/nemo_platform/types/files/fileset_file.py is excluded by !sdk/**
📒 Files selected for processing (14)
  • openapi/ga/individual/platform.openapi.yaml
  • openapi/ga/openapi.yaml
  • openapi/openapi.yaml
  • packages/filesets/src/filesets/filesystem/filesystem.py
  • packages/filesets/src/filesets/resources.py
  • packages/nemo_platform_plugin/src/nemo_platform_plugin/client/client.py
  • packages/nemo_platform_plugin/src/nemo_platform_plugin/client/response.py
  • packages/nemo_platform_plugin/src/nemo_platform_plugin/files/endpoints.py
  • packages/nemo_platform_plugin/src/nemo_platform_plugin/files/types.py
  • packages/nemo_platform_plugin/tests/files/test_endpoints.py
  • services/core/files/src/nmp/core/files/api/v2/filesets/endpoints.py
  • services/core/files/src/nmp/core/files/api/v2/filesets/schemas.py
  • services/core/files/src/nmp/core/files/app/cache.py
  • services/core/files/src/nmp/core/files/entities.py

Comment on lines +361 to 399
@staticmethod
def _client_from_sdk(sdk: Any) -> AsyncNemoClient:
"""Convert a NeMoPlatform SDK instance to an AsyncNemoClient.

Handles both sync and async SDK instances. For sync SDKs, creates
a new AsyncNemoClient with a fresh httpx.AsyncClient.
"""
import httpx
from nemo_platform import AsyncNeMoPlatform, NeMoPlatform

def _get_sdk(
self,
sdk: NeMoPlatform | AsyncNeMoPlatform,
) -> AsyncNeMoPlatform:
# If already an async SDK, use it as-is to preserve custom transports (e.g., test clients)
if isinstance(sdk, AsyncNeMoPlatform):
return sdk
return AsyncNemoClient(
base_url=str(sdk.base_url).rstrip("/"),
workspace=sdk.workspace,
http_client=sdk._client,
)

# Convert sync SDK to async SDK
if not isinstance(sdk, NeMoPlatform):
raise TypeError(f"Expected NeMoPlatform or AsyncNeMoPlatform, got {type(sdk).__name__}")

# Convert sync SDK to async client with a fresh httpx.AsyncClient
transport: httpx.AsyncBaseTransport | None = None
if TestClient is not None and isinstance(sdk._client, TestClient):
# If using a synchronous test client, we should use the ASGITransport
transport = httpx.ASGITransport(app=sdk._client.app)
try:
from starlette.testclient import TestClient

return AsyncNeMoPlatform(
if isinstance(sdk._client, TestClient):
transport = httpx.ASGITransport(app=sdk._client.app)
except ImportError:
pass

return AsyncNemoClient(
base_url=str(sdk.base_url).rstrip("/"),
workspace=sdk.workspace,
base_url=sdk.base_url,
timeout=sdk.timeout,
max_retries=sdk.max_retries,
default_headers=sdk._custom_headers,
default_query=sdk.default_query,
http_client=httpx.AsyncClient(
transport=transport,
base_url=sdk.base_url,
base_url=str(sdk.base_url).rstrip("/"),
headers=sdk._custom_headers,
),
)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🩺 Stability & Availability | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
rg -nP -C4 '(http_client|aclose|__aexit__|async with|def close)' packages/nemo_platform_plugin/src/nemo_platform_plugin/client/client.py

Repository: NVIDIA-NeMo/nemo-platform

Length of output: 1176


🏁 Script executed:

ast-grep outline packages/nemo_platform_plugin/src/nemo_platform_plugin/client/client.py --view expanded

Repository: NVIDIA-NeMo/nemo-platform

Length of output: 1383


🏁 Script executed:

rg -nP 'class AsyncNemoClient' packages/nemo_platform_plugin/src/nemo_platform_plugin/client/client.py -A 150 | head -200

Repository: NVIDIA-NeMo/nemo-platform

Length of output: 3388


🏁 Script executed:

rg -nP 'class AsyncNemoClient' packages/nemo_platform_plugin/src/nemo_platform_plugin/client/client.py -A 500 | tail -300

Repository: NVIDIA-NeMo/nemo-platform

Length of output: 3388


🏁 Script executed:

rg -nP '(__aexit__|aclose|__del__)' packages/nemo_platform_plugin/src/nemo_platform_plugin/client/client.py

Repository: NVIDIA-NeMo/nemo-platform

Length of output: 163


🏁 Script executed:

rg -nP 'AsyncNemoClient' packages/filesets/src/filesets/filesystem/filesystem.py -B 2 -A 5

Repository: NVIDIA-NeMo/nemo-platform

Length of output: 2433


🏁 Script executed:

wc -l packages/nemo_platform_plugin/src/nemo_platform_plugin/client/client.py

Repository: NVIDIA-NeMo/nemo-platform

Length of output: 239


AsyncNemoClient doesn't close the http_client it receives or creates.

The freshly created httpx.AsyncClient in the sync-SDK branch (lines 394–398) is stored in AsyncNemoClient._http but never closed. AsyncNemoClient has no cleanup methods (__aexit__, aclose, or __del__), so the connection pool leaks when the instance is dropped. Either add a cleanup mechanism to AsyncNemoClient or track and close the client in FilesetFileSystem.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/filesets/src/filesets/filesystem/filesystem.py` around lines 361 -
399, The `AsyncNemoClient` class stores the `http_client` passed to it in the
`_client_from_sdk` method but has no cleanup mechanism to close the connection,
causing resource leaks when instances are destroyed. Add async cleanup methods
such as `__aexit__` or `aclose` to the `AsyncNemoClient` class to properly close
the `http_client` when the instance is destroyed, or alternatively, have
`FilesetFileSystem` track and manage the lifecycle of the `AsyncNemoClient`
instances it creates by calling close on them when they are no longer needed.

Comment on lines 612 to +626
async def _rm_file(self, path: str, **kwargs) -> None:
"""Delete a single file."""
workspace, fileset, file_path = parse_fileset_ref(path, workspace_fallback=self._sdk.workspace)
workspace, fileset, file_path = parse_fileset_ref(path, workspace_fallback=self._workspace)
if not file_path:
raise ValueError("Cannot delete fileset root via rm")
await self._sdk.files._delete_file(file_path, workspace=workspace, name=fileset)
await self._client.send(endpoints.delete_file(workspace=workspace, name=fileset, path=file_path))
# Invalidate parent directory's cache since file info is stored there
self.invalidate_cache(self._parent(build_fileset_ref(path)))

async def _pipe_file(self, path: str, value: bytes, **kwargs) -> None:
"""Write bytes to a file."""
workspace, fileset, file_path = parse_fileset_ref(path, workspace_fallback=self._sdk.workspace)
workspace, fileset, file_path = parse_fileset_ref(path, workspace_fallback=self._workspace)
if not file_path:
raise ValueError("File path required for upload")
await self._sdk.files._upload_file(file_path, body=value, workspace=workspace, name=fileset)
await self._client.send(endpoints.upload_file(workspace=workspace, name=fileset, path=file_path, content=value))

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗄️ Data Integrity & Integration | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
ast-grep outline packages/nemo_platform_plugin/src/nemo_platform_plugin/client/client.py
sed -n '120,270p' packages/nemo_platform_plugin/src/nemo_platform_plugin/client/client.py
rg -nP -C4 'def (read|aread|data)\b' packages/nemo_platform_plugin/src/nemo_platform_plugin/client/response.py

Repository: NVIDIA-NeMo/nemo-platform

Length of output: 7374


🏁 Script executed:

# Get the full context around lines 612-626 and check for _pipe_stream and _put_file
sed -n '600,710p' packages/filesets/src/filesets/filesystem/filesystem.py

# Also check _ls and _find to verify they call .data()
rg -n '\.data\(\)' packages/filesets/src/filesets/filesystem/filesystem.py | head -20

Repository: NVIDIA-NeMo/nemo-platform

Length of output: 5437


🏁 Script executed:

# Check _cat_file to see what method it uses for reading
sed -n '540,580p' packages/filesets/src/filesets/filesystem/filesystem.py

Repository: NVIDIA-NeMo/nemo-platform

Length of output: 1968


Discarded send() results swallow upload/delete failures.

send() does not raise on non-2xx; only .data() does. _rm_file (line 617), _pipe_file (line 626), _pipe_stream (line 660), and _put_file (line 698) discard the response and never call .data(), so a 403 (immutable fileset), 408, 507, or 500 from the upload/delete endpoints passes silently — the parent cache is then invalidated as if the write succeeded.

Force error surfacing on write/delete
-    await self._client.send(endpoints.delete_file(workspace=workspace, name=fileset, path=file_path))
+    (await self._client.send(endpoints.delete_file(workspace=workspace, name=fileset, path=file_path))).data()
-    await self._client.send(endpoints.upload_file(workspace=workspace, name=fileset, path=file_path, content=value))
+    (await self._client.send(endpoints.upload_file(workspace=workspace, name=fileset, path=file_path, content=value))).data()

Apply equivalently in _pipe_stream (line 660) and _put_file (line 698).

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/filesets/src/filesets/filesystem/filesystem.py` around lines 612 -
626, The `send()` method does not raise exceptions on non-2xx HTTP responses;
only calling `.data()` on the response does. The `_rm_file` method and
`_pipe_file` method are discarding the response from `send()` without calling
`.data()`, which silently swallows errors like 403, 408, 507, or 500. To fix
this, capture the result of each `await self._client.send()` call and
immediately call `.data()` on it to ensure HTTP errors are surfaced before the
parent cache is invalidated. Apply this same fix pattern to the `_pipe_stream`
and `_put_file` methods as mentioned in the comment.

Comment thread packages/filesets/src/filesets/resources.py Outdated
Comment on lines +79 to +88
NAME_PATTERN = r"^[\w\-.]+$"
MAX_LENGTH = 255


class CreateFilesetRequest(BaseModel):
name: str = Field(
description="The name of the fileset. Allowed characters: letters (a-z, A-Z), digits (0-9), underscores, hyphens, and dots.",
max_length=MAX_LENGTH,
pattern=NAME_PATTERN,
examples=["training-data-v1", "llama-checkpoint"],

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win

NAME_PATTERN is broader than the documented character set.

Line 79 uses \w, which accepts Unicode word characters; the field description says ASCII letters/digits/underscore/hyphen/dot only.

Proposed fix
-NAME_PATTERN = r"^[\w\-.]+$"
+NAME_PATTERN = r"^[A-Za-z0-9_.-]+$"
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/nemo_platform_plugin/src/nemo_platform_plugin/files/types.py` around
lines 79 - 88, The NAME_PATTERN regex uses \w which matches Unicode word
characters, but the CreateFilesetRequest name field documentation explicitly
limits allowed characters to ASCII letters (a-z, A-Z), digits (0-9),
underscores, hyphens, and dots. Update the NAME_PATTERN regex definition to use
character classes that match only the documented ASCII character set instead of
the broader \w pattern, ensuring the regex restriction aligns with the field
description.

) -> FilesetFileOutput:
"""Convert a FileInfo to a FilesetFileOutput response DTO."""
return FilesetFileOutput(
file_url=f"/apis/files/v2/workspaces/{workspace}/filesets/{name}/-/{file_info.path}",

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗄️ Data Integrity & Integration | 🟡 Minor | ⚡ Quick win

Encode file_url path components.

Raw file_info.path breaks returned URLs when paths contain ?, #, spaces, or %.

Proposed fix
 from typing import Annotated, Optional
+from urllib.parse import quote
 def fileset_file_output_from_info(
     workspace: str,
     name: str,
     file_info: FileInfo,
     cache_status: CacheStatus | None = None,
 ) -> FilesetFileOutput:
     """Convert a FileInfo to a FilesetFileOutput response DTO."""
+    encoded_workspace = quote(workspace, safe="")
+    encoded_name = quote(name, safe="")
+    encoded_path = quote(file_info.path, safe="/")
     return FilesetFileOutput(
-        file_url=f"/apis/files/v2/workspaces/{workspace}/filesets/{name}/-/{file_info.path}",
+        file_url=f"/apis/files/v2/workspaces/{encoded_workspace}/filesets/{encoded_name}/-/{encoded_path}",
         file_ref=f"{workspace}/{name}#{file_info.path}",
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@services/core/files/src/nmp/core/files/api/v2/filesets/schemas.py` at line
62, The file_url construction is directly interpolating file_info.path without
URL encoding, which breaks URLs containing special characters like spaces,
question marks, hashes, or percent signs. In the file_url assignment where
file_info.path is included in the f-string, apply URL encoding to the path
component using urllib.parse.quote() to ensure special characters are properly
escaped before being inserted into the URL path.

@github-actions

github-actions Bot commented Jun 24, 2026

Copy link
Copy Markdown
Contributor
Suite Lines Covered Line Rate Branch Rate
Unit Tests 20970/27524 76.2% 61.2%
Integration Tests 12223/26293 46.5% 19.8%

Signed-off-by: Matthew Grossman <mgrossman@nvidia.com>

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@packages/filesets/src/filesets/resources.py`:
- Around line 234-239: The fsspec property in FilesResource class is using
sdk=self._raw_client to instantiate FilesetFileSystem, which fails when
FilesResource is initialized with NemoClient because _client_from_sdk only
accepts NeMoPlatform or AsyncNeMoPlatform. Fix this by changing the
FilesetFileSystem instantiation in the fsspec cached_property method to use
client=self._client instead of sdk=self._raw_client, matching the pattern
already used in the async variant at line 762.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: eb0652c3-d9d5-4b66-a12f-e59bd051336d

📥 Commits

Reviewing files that changed from the base of the PR and between 7e65bf1 and a31d202.

⛔ Files ignored due to path filters (2)
  • sdk/python/nemo-platform/src/nemo_platform/filesets/filesystem/filesystem.py is excluded by !sdk/**
  • sdk/python/nemo-platform/src/nemo_platform/filesets/resources.py is excluded by !sdk/**
📒 Files selected for processing (2)
  • packages/filesets/src/filesets/filesystem/filesystem.py
  • packages/filesets/src/filesets/resources.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • packages/filesets/src/filesets/filesystem/filesystem.py

Comment on lines 234 to +239
@cached_property
def fsspec(self) -> FilesetFileSystem:
"""Access the underlying fsspec filesystem."""
return FilesetFileSystem(sdk=self._client)
# FilesetFileSystem._client_from_sdk handles NeMoPlatform → AsyncNemoClient
# conversion with proper transport detection (e.g. TestClient → ASGITransport).
return FilesetFileSystem(sdk=self._raw_client, asynchronous=False)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🩺 Stability & Availability | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
rg -nP '\bFilesResource\s*\(' packages services -C3
rg -nP 'def _client_from_sdk' -A20 packages/filesets/src/filesets/filesystem/filesystem.py

Repository: NVIDIA-NeMo/nemo-platform

Length of output: 1125


🏁 Script executed:

# First, check the FilesResource.__init__ signature and what it accepts
ast-grep outline packages/filesets/src/filesets/resources.py --match FilesResource -A 10

# Read the specific lines mentioned in the review
sed -n '220,240p' packages/filesets/src/filesets/resources.py

# Check line 762 to see the async variant mentioned
sed -n '755,770p' packages/filesets/src/filesets/resources.py

# Search for FilesResource instantiation patterns
rg -n 'FilesResource\s*\(' packages services --context 5

# Check what types _client_from_sdk actually accepts
cat packages/filesets/src/filesets/filesystem/filesystem.py | sed -n '368,400p'

Repository: NVIDIA-NeMo/nemo-platform

Length of output: 3144


🏁 Script executed:

# Find NemoClient definition
rg -n 'class NemoClient' packages services

# Check FilesetFileSystem __init__ signature
ast-grep outline packages/filesets/src/filesets/filesystem/filesystem.py

# Look for actual FilesResource instantiation patterns in tests or code
rg -n 'FilesResource\(' packages services -B2 -A2

# Check what imports are used
sed -n '1,50p' packages/filesets/src/filesets/resources.py

Repository: NVIDIA-NeMo/nemo-platform

Length of output: 2929


🏁 Script executed:

# Get FilesetFileSystem.__init__ signature and implementation
sed -n '294,330p' packages/filesets/src/filesets/filesystem/filesystem.py

# Look for actual usage of FilesResource with different client types
rg -n 'FilesResource\s*\(' packages services --no-heading -A3

# Check if there's test code instantiating FilesResource with NemoClient
fd -e '*.py' packages -x grep -l 'FilesResource' {} \; | head -10

# Look at the actual error case - when would NemoClient be passed
rg -n 'sdk\s*=' packages/filesets/src/filesets/resources.py -B5 -A5

Repository: NVIDIA-NeMo/nemo-platform

Length of output: 2161


🏁 Script executed:

# Find the __init__ method of FilesetFileSystem
sed -n '294,370p' packages/filesets/src/filesets/filesystem/filesystem.py

# Also check if there's a base class AsyncFileSystem that defines __init__
rg -n 'def __init__' packages/filesets/src/filesets/filesystem/filesystem.py -A10

# Search for all places where FilesetFileSystem is instantiated in the codebase
rg -n 'FilesetFileSystem\(' packages services -B2 -A2

Repository: NVIDIA-NeMo/nemo-platform

Length of output: 12127


Sync fsspec property breaks when FilesResource is created with NemoClient.

__init__ accepts NemoClient (line 222) and stores it as self._raw_client. The fsspec property then passes sdk=self._raw_client to FilesetFileSystem, but _client_from_sdk raises TypeError for anything other than NeMoPlatform/AsyncNeMoPlatform. Since NemoClient is neither, accessing .fsspec on a FilesResource(nemo_client) instance throws.

The async variant avoids this by using client=self._client (line 762), which is the adapter-converted client. The sync path needs the same fix: use client=self._client instead of sdk=self._raw_client.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/filesets/src/filesets/resources.py` around lines 234 - 239, The
fsspec property in FilesResource class is using sdk=self._raw_client to
instantiate FilesetFileSystem, which fails when FilesResource is initialized
with NemoClient because _client_from_sdk only accepts NeMoPlatform or
AsyncNeMoPlatform. Fix this by changing the FilesetFileSystem instantiation in
the fsspec cached_property method to use client=self._client instead of
sdk=self._raw_client, matching the pattern already used in the async variant at
line 762.

@matthewgrossman matthewgrossman changed the title feat(sdk): Migrate Files to use the new NemoClient SDK feat(files): migrate filesets package to NemoClient typed HTTP client Jun 24, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant