Skip to content

feat: Model Streamer initial support#136

Open
nicolasnoble wants to merge 14 commits intomainfrom
nnoble/model-streamer-provider
Open

feat: Model Streamer initial support#136
nicolasnoble wants to merge 14 commits intomainfrom
nnoble/model-streamer-provider

Conversation

@nicolasnoble
Copy link
Contributor

@nicolasnoble nicolasnoble commented Jan 28, 2026

We provide a method to download from S3 using Model Streamer. Given Model Streamer's API, this ends up as a Python sidecar process for which we add a Rust interface so this becomes completely transparent for the end user, where we only have a new provider that they can use.

This also adds a small test in the form of a minio storage.

I'm not 100% certain of what I'm doing in Python here, so adding @ajcasagrande for reviewing this part too.

Summary by CodeRabbit

Release Notes

  • New Features

    • Added Model Streamer provider for downloading models from S3, GCS, or S3-compatible storage
    • Introduced Python sidecar service for cloud-based model management
    • Added configuration support for cloud storage credentials (S3, GCS, MinIO)
  • Chores

    • Enhanced Docker deployment with multi-service orchestration (server, sidecar, MinIO)
    • Added GitHub Actions pipeline for integration testing with cloud storage

@nicolasnoble
Copy link
Contributor Author

Also mentioning @noa-neria here for the Model Streamer portion.

@nicolasnoble nicolasnoble requested a review from nv-hwoo January 28, 2026 22:47
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a Python sidecar service that integrates with Run:ai Model Streamer to download models from S3/GCS (including MinIO), wires a new ModelStreamer provider into the Rust backend, and updates container/CI infrastructure to run both the main server and the sidecar with MinIO-based tests.

Changes:

  • Add a FastAPI-based sidecar (modelexpress_sidecar) with download/list/delete APIs backed by S3/GCS via Model Streamer (with a boto3 fallback) plus Python unit/integration tests using MinIO.
  • Introduce a new ModelProvider::ModelStreamer in the Rust common crate, implement a ModelStreamerProvider that talks to the sidecar over HTTP, and map it through gRPC/database layers.
  • Update Dockerfile, docker-compose, entrypoint, workspace dependencies, and CI to build and run the sidecar alongside the gRPC server and to execute the new Python tests in CI.

Reviewed changes

Copilot reviewed 32 out of 34 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
sidecar/tests/test_s3_integration.py Adds async integration tests hitting the sidecar’s /api/v1/download and /api/v1/models/... endpoints against a MinIO-backed S3-compatible storage.
sidecar/tests/test_api.py Adds basic API tests for the sidecar (/health, model GET/DELETE 404 behavior, invalid download URI behavior).
sidecar/tests/init.py Marks the sidecar tests package and adds SPDX/license header.
sidecar/src/modelexpress_sidecar/services/downloader.py Implements ModelDownloader with URI parsing, environment-based S3/GCS credential setup, and S3 downloads via Model Streamer (with boto3 fallback), plus cache lookup and deletion.
sidecar/src/modelexpress_sidecar/services/init.py Declares the services package for the sidecar.
sidecar/src/modelexpress_sidecar/main.py Defines the FastAPI application, logging setup, lifespan hooks, /health endpoint, and CLI entrypoint for running the sidecar with uvicorn.
sidecar/src/modelexpress_sidecar/api/schemas.py Introduces Pydantic models for S3/GCS credentials and download/get/delete responses for the sidecar API.
sidecar/src/modelexpress_sidecar/api/routes.py Wires HTTP routes (/download, /models/{model_id} GET/DELETE) to the ModelDownloader`, including error handling and HTTP status mapping.
sidecar/src/modelexpress_sidecar/api/init.py Declares the API subpackage for the sidecar.
sidecar/src/modelexpress_sidecar/init.py Declares the top-level sidecar package and version.
sidecar/src/modelexpress_sidecar.egg-info/top_level.txt Records the top-level Python package name for distribution metadata.
sidecar/src/modelexpress_sidecar.egg-info/requires.txt Lists Python runtime and dev dependencies (FastAPI, Pydantic, Model Streamer, boto3, pytest, httpx) for the sidecar package.
sidecar/src/modelexpress_sidecar.egg-info/dependency_links.txt Placeholder for extra dependency links in the sidecar package metadata.
sidecar/src/modelexpress_sidecar.egg-info/SOURCES.txt Enumerates packaged Python source and test files for the sidecar distribution.
sidecar/src/modelexpress_sidecar.egg-info/PKG-INFO Provides the generated package metadata (name, version, requirements, classifiers) for the sidecar.
sidecar/requirements.txt Defines Python runtime and dev dependencies for the sidecar (including httpx used by the entrypoint health check).
sidecar/pyproject.toml Declares the modelexpress-sidecar Python project configuration, dependencies, extras, and pytest settings.
modelexpress_server/src/database.rs Extends serialization/deserialization of the ModelProvider field to understand and persist the new ModelStreamer provider.
modelexpress_common/src/providers/model_streamer.rs Implements ModelStreamerProvider using reqwest to call the sidecar’s download/get/delete APIs, including health check helpers and unit tests.
modelexpress_common/src/providers.rs Registers the new model_streamer module and re-exports ModelStreamerProvider in the providers module.
modelexpress_common/src/models.rs Extends the ModelProvider enum with a ModelStreamer variant and adds tests for serialization.
modelexpress_common/src/lib.rs Maps the extended ModelProvider enum to and from the new gRPC MODEL_STREAMER provider value.
modelexpress_common/src/download.rs Updates provider factory to construct ModelStreamerProvider from environment and adds tests to verify the new provider wiring.
modelexpress_common/src/config.rs Adds SidecarConfig, S3Credentials, and GcsCredentials structs plus env-loaders and tests for the Model Streamer integration.
modelexpress_common/proto/model.proto Extends the ModelProvider protobuf enum with MODEL_STREAMER.
modelexpress_common/Cargo.toml Adds reqwest and urlencoding as workspace dependencies used by the Model Streamer provider.
entrypoint.sh Introduces an entrypoint that starts the Python sidecar in a venv, waits (via httpx) for its /health endpoint, and then starts the Rust ModelExpress server, ensuring cleanup on exit.
docker-compose.yml Updates service ports and environment variables, adds MinIO and a MinIO init sidecar, and mounts a shared model cache volume for the server container.
Dockerfile Installs Python and a venv for the sidecar, installs sidecar dependencies, copies sidecar sources, exposes ports for the gRPC server and sidecar, and switches to using the new entrypoint script.
Cargo.toml Adds workspace-level reqwest (with JSON + rustls-tls) and urlencoding dependencies for common code.
Cargo.lock Updates lockfile to include reqwest and urlencoding (and associated transitive dependencies).
.gitignore Ignores Python virtual environments (e.g., .venv/) used during testing or local development.
.github/workflows/ci.yml Adds a sidecar-test job that installs the sidecar with dev extras, runs unit tests, starts a MinIO container, seeds test data, and runs the full sidecar test suite against MinIO.
.dockerignore Expands Docker build context ignore rules to exclude Rust target artifacts and Python virtual envs/bytecode, reducing build context size.

@nicolasnoble
Copy link
Contributor Author

A couple of items from the review feedback that we're deferring to a follow-up PR:

  1. Use model streamer pull_files instead of raw boto3 (from @noa-neria) - this would give us GCS and Azure support for free instead of reimplementing with boto3. Agreed this is the right direction, just want to keep this PR focused.

  2. Replace success=False + HTTP 200 with proper HTTP status codes (from @ajcasagrande) - this also covers the "delete route returns 2 different response types" comment. The plan is to use appropriate 4xx/5xx codes with a JSON body for error details, rather than returning 200 with success=False. This touches the response schemas, route handlers, downloader, and the Rust client, so it's better as its own PR.

@coderabbitai
Copy link

coderabbitai bot commented Feb 18, 2026

Walkthrough

This PR integrates a Python-based Model Streamer sidecar service into ModelExpress to support downloading models from S3, GCS, and MinIO storage. It adds a new FastAPI sidecar service, updates Docker configurations to run both services, extends the Rust codebase with ModelStreamer provider support and related configuration structures, and introduces CI workflows for sidecar testing with MinIO.

Changes

Cohort / File(s) Summary
Project Ignores
.dockerignore, .gitignore
Updated to ignore Python artifacts (\_\pycache_\/, *.pyc, .venv/) and Docker build files; Dockerfile re-added to .dockerignore.
Docker Infrastructure
Dockerfile, docker-compose.yml, entrypoint.sh
Extended runtime image to include Python 3 environment, sidecar package installation, and dual-service orchestration via entrypoint script. Added MinIO and minio-init services to docker-compose with model-cache volume and proper service dependencies.
Rust Workspace Dependencies
Cargo.toml, modelexpress_common/Cargo.toml
Added reqwest (v0.12 with json and rustls-tls features) and urlencoding (v2.1) as workspace dependencies; referenced in modelexpress_common.
Rust Common Library - Core Models & Config
modelexpress_common/proto/model.proto, modelexpress_common/src/models.rs, modelexpress_common/src/config.rs, modelexpress_common/src/lib.rs
Added ModelStreamer provider enum variant, new SidecarConfig struct with environment loading and builder methods, S3Credentials and GcsCredentials for credential management, and updated ModelProvider conversions to handle ModelStreamer round-trip serialization.
Rust Common Library - Provider & Download Logic
modelexpress_common/src/providers.rs, modelexpress_common/src/providers/model_streamer.rs, modelexpress_common/src/download.rs
Introduced ModelStreamerProvider with HTTP client, credential building, S3/GCS URI encoding, and core download/delete/query operations. Extended get_provider to instantiate ModelStreamer via from_env with error handling. Added comprehensive unit tests for provider initialization, encoding, and credential builders.
Rust Server Database
modelexpress_server/src/database.rs
Extended CRUD and query operations to map ModelStreamer provider string in get_model_record, set_status, get_models_by_last_used, and try_claim_for_download flows.
Python Sidecar Package Structure
sidecar/pyproject.toml, sidecar/src/modelexpress_sidecar/__init__.py, sidecar/src/modelexpress_sidecar/api/__init__.py, sidecar/src/modelexpress_sidecar/services/__init__.py, sidecar/tests/__init__.py
Created new Python package with pyproject.toml declaring FastAPI, Uvicorn, Pydantic v2, RunAI model streamer, and boto3 dependencies; initialized module structure with version metadata and license headers.
Python Sidecar API Schemas & Routes
sidecar/src/modelexpress_sidecar/api/schemas.py, sidecar/src/modelexpress_sidecar/api/routes.py
Defined Pydantic models for S3/GCS credentials and request/response payloads (DownloadRequest, DownloadResponse, GetModelResponse, DeleteResponse). Implemented FastAPI router with POST /download, GET /models/{model_id}, and DELETE /models/{model_id} endpoints using centralized ModelDownloader instance.
Python Sidecar Main Server
sidecar/src/modelexpress_sidecar/main.py
Created FastAPI app with startup/shutdown lifecycle context manager, /health endpoint, and /api/v1 router inclusion. Configured uvicorn to run on MODEL_EXPRESS_SIDECAR_PORT (default 8002) with logging from MODEL_EXPRESS_LOG_LEVEL.
Python Sidecar Services
sidecar/src/modelexpress_sidecar/services/downloader.py
Implemented ModelDownloader class orchestrating downloads from S3/GCS/MinIO with credential setup/restoration, URI parsing, cache path resolution, Model Streamer integration fallback to boto3, file filtering, and model inspection/deletion. Handles error cases gracefully with structured responses.
Python Sidecar Tests
sidecar/tests/test_api.py, sidecar/tests/test_s3_integration.py
Added async unit tests for health, 404 scenarios, and invalid URIs; integration tests for MinIO downloads (conditionally skipped if unavailable), including credential flow, model retrieval, and error handling.
CI Workflow
.github/workflows/ci.yml
Added new "Sidecar Python Tests" job that sets up Python 3.12, installs sidecar in dev mode, runs unit tests, provisions MinIO with bucket, executes integration tests with S3 environment variables, and cleans up MinIO container.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~70 minutes

Poem

🐰 Hop, hop—a sidecar springs to life!
Python whispers where Rust once strode,
S3 treasures, MinIO delights,
ModelStreamer blazes the download road. ✨📦

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title 'feat: Model Streamer initial support' directly and clearly summarizes the main change: initial support for Model Streamer provider for downloading models from S3/GCS storage.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

Tip

Issue Planner is now in beta. Read the docs and try it out! Share your feedback on Discord.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 15

🧹 Nitpick comments (12)
.gitignore (1)

53-58: LGTM! Consider adding a few more commonly generated Python paths.

The added entries are all correct. As an optional follow-up, the following patterns are commonly generated by Python toolchains and may appear as the sidecar matures:

 # Python
 .venv/
 __pycache__/
 *.pyc
 *.pyo
 *.egg-info/
+dist/
+build/
+.pytest_cache/
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.gitignore around lines 53 - 58, The .gitignore currently covers .venv/,
__pycache__/, *.pyc, *.pyo, and *.egg-info/ but should also include other common
Python artifacts; update the ignore entries to add patterns such as venv/, env/,
.env, .Python, .pytest_cache/, .mypy_cache/, build/, dist/, pip-wheel-metadata/
and *.egg to ensure virtual environments, caches, build artifacts, and wheel
metadata (the same area as the existing .venv/ and *.egg-info/ entries) are
excluded.
sidecar/src/modelexpress_sidecar/api/schemas.py (1)

11-19: Consider whether region should be Optional.

region defaults to "us-east-1" but is typed Optional[str], which means a caller can explicitly pass null to set it to None, overriding the default. If None is a valid value (e.g., for MinIO/S3-compatible endpoints that don't use regions), this is fine. Otherwise, changing the type to str would prevent accidental None values.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@sidecar/src/modelexpress_sidecar/api/schemas.py` around lines 11 - 19, The
S3Credentials.schema defines region as Optional[str] with a default "us-east-1",
allowing callers to explicitly pass None which overrides the default; decide
whether None is meaningful and then make the type match: if region must always
have a value, change the type on the region field in S3Credentials from
Optional[str] to str so callers cannot pass null and the default is enforced,
otherwise keep Optional[str] but add a comment or validation in S3Credentials
(e.g., in a validator) clarifying that None is acceptable for S3-compatible
endpoints.
modelexpress_common/src/config.rs (1)

383-401: Default endpoint and timeout values are duplicated between Default and from_env().

The literal "http://127.0.0.1:8002" appears in both Default::default() (line 386) and from_env() (line 396), and 300 appears in both (lines 387 and 401). Extract these into constants to keep them in sync.

Proposed fix
+const DEFAULT_SIDECAR_ENDPOINT: &str = "http://127.0.0.1:8002";
+const DEFAULT_SIDECAR_TIMEOUT_SECS: u64 = 300;
+
 impl Default for SidecarConfig {
     fn default() -> Self {
         Self {
-            endpoint: "http://127.0.0.1:8002".to_string(),
-            timeout_secs: 300,
+            endpoint: DEFAULT_SIDECAR_ENDPOINT.to_string(),
+            timeout_secs: DEFAULT_SIDECAR_TIMEOUT_SECS,
         }
     }
 }
 
 impl SidecarConfig {
     pub fn from_env() -> Self {
         let endpoint = std::env::var("MODEL_EXPRESS_SIDECAR_ENDPOINT")
-            .unwrap_or_else(|_| "http://127.0.0.1:8002".to_string());
+            .unwrap_or_else(|_| DEFAULT_SIDECAR_ENDPOINT.to_string());
 
         let timeout_secs = std::env::var("MODEL_EXPRESS_SIDECAR_TIMEOUT")
             .ok()
             .and_then(|s| s.parse().ok())
-            .unwrap_or(300);
+            .unwrap_or(DEFAULT_SIDECAR_TIMEOUT_SECS);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelexpress_common/src/config.rs` around lines 383 - 401, Extract the
duplicated default values into constants and use them from both the Default impl
and from_env(): define e.g. const DEFAULT_SIDECAR_ENDPOINT: &str =
"http://127.0.0.1:8002"; and const DEFAULT_SIDECAR_TIMEOUT: u64 = 300, then
replace the literal in impl Default for SidecarConfig::default() (endpoint and
timeout_secs) and the fallbacks in SidecarConfig::from_env() (the unwrap_or_else
endpoint and unwrap_or timeout) to reference DEFAULT_SIDECAR_ENDPOINT and
DEFAULT_SIDECAR_TIMEOUT so the values remain single-sourced.
entrypoint.sh (1)

29-36: Health check spawns a full Python process on every retry.

Each iteration invokes /app/venv/bin/python -c "import httpx; ...", which has nontrivial startup cost. If curl or wget is available in the container image, that would be significantly lighter. If not, consider a single Python script that loops internally.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@entrypoint.sh` around lines 29 - 36, The health-check loop currently spawns a
full Python interpreter each retry by calling /app/venv/bin/python -c "import
httpx; httpx.get(...)" which is slow; replace this by either (a) using a
lightweight shell tool if available (curl or wget) to hit
http://127.0.0.1:${SIDECAR_PORT}/health inside the loop, or (b) create a small
persistent Python helper that performs the retry loop internally (so only one
interpreter starts) and call that instead; update references to RETRY_COUNT and
MAX_RETRIES to match the new approach and ensure the same success/failure
behavior and exit conditions are preserved.
.github/workflows/ci.yml (1)

239-254: Pin the MinIO image tag for CI reproducibility.

Using minio/minio:latest means CI behavior can change without any code change. Consider pinning to a specific release (e.g., minio/minio:RELEASE.2025-01-20T14-49-07Z) and bumping it intentionally. The same applies to minio/mc.

Also, sleep 5 for readiness is fragile on slow CI runners. A health-check loop (e.g., polling http://localhost:9000/minio/health/live) would be more reliable.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/ci.yml around lines 239 - 254, Update the "Start MinIO for
integration tests" step to pin the MinIO and mc images instead of using
minio/minio:latest and minio/mc:latest (choose a specific release tag like
RELEASE.x) and replace the fragile sleep 5 wait with a readiness loop that polls
MinIO's health endpoint (http://localhost:9000/minio/health/live) until healthy
before creating buckets and uploading test data; keep the existing docker run
and mc commands but reference the pinned image tags and wait for the health
check to succeed before running the mc container.
sidecar/tests/test_s3_integration.py (1)

56-67: Extract duplicated S3 credential setup into a fixture or helper.

The credential dict (s3_credentials with access_key_id, secret_access_key, region, endpoint) and the download request payload are duplicated across test_download_from_minio and test_get_downloaded_model. Consider a shared fixture or helper to reduce duplication.

Also applies to: 85-97

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@sidecar/tests/test_s3_integration.py` around lines 56 - 67, Extract the
duplicated s3_credentials dict and the repeated download request payload into a
shared fixture or helper used by test_download_from_minio and
test_get_downloaded_model: create a pytest fixture (e.g., s3_credentials or
make_s3_payload) that returns the dict with access_key_id, secret_access_key,
region, and endpoint (pulling from os.environ fallbacks) and optionally a
function that builds the POST JSON including model_path and cache_dir, then
replace the inline dicts in test_download_from_minio and
test_get_downloaded_model with calls to that fixture/helper to remove
duplication and centralize credential setup.
Dockerfile (1)

43-46: Test binaries included in runtime image.

test_client and fallback_test are copied into the production runtime image. Consider using a separate test stage or removing these from the runtime image to reduce its size and attack surface.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Dockerfile` around lines 43 - 46, The Dockerfile is copying test artifacts
into the production runtime image (COPY --from=builder
/app/target/release/test_client and /fallback_test); remove those two COPY lines
or move them into a separate multi-stage "test" stage so only production
binaries (modelexpress-server and modelexpress-cli) are included in the final
runtime image; ensure the final stage only copies
/app/target/release/modelexpress-server and modelexpress-cli and, if needed, add
a dedicated test stage that copies test_client and fallback_test for CI/test
images instead of the production image.
modelexpress_server/src/database.rs (1)

108-112: Consider extracting provider string conversion to a shared helper.

The provider string-to-enum and enum-to-string mappings are duplicated across get_model_record, get_models_by_last_used, set_status, try_claim_for_download, and get_status. As more providers are added, each new variant must be patched in multiple places. Centralizing this (e.g., impl Display/FromStr on ModelProvider) would reduce maintenance burden.

Also applies to: 240-243

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelexpress_server/src/database.rs` around lines 108 - 112, The provider
string/enum conversion logic is duplicated across get_model_record,
get_models_by_last_used, set_status, try_claim_for_download and get_status;
extract this into centralized conversions by implementing std::str::FromStr for
ModelProvider (to parse provider_str → ModelProvider) and std::fmt::Display (or
From<ModelProvider> for &str / ToString) for enum→string, update those functions
to use provider_str.parse::<ModelProvider>() and provider.to_string() (or
.to_string()) instead of repeating match arms, and remove the duplicated match
blocks so adding new variants only requires updating ModelProvider's impls.
sidecar/src/modelexpress_sidecar/services/downloader.py (2)

40-54: Inconsistent case handling in is_ignored_file vs is_weight_file/is_image_file.

is_weight_file and is_image_file lowercase the filename before comparison, but is_ignored_file does an exact (case-sensitive) match against IGNORED_FILES. A file named readme.md or README.MD would not be filtered. The Rust side has the same inconsistency, so this is at least consistent across the codebase, but worth noting.

Optional fix
 def is_ignored_file(filename: str) -> bool:
     """Check if a file should be ignored."""
-    return filename in IGNORED_FILES
+    return filename.lower() in {f.lower() for f in IGNORED_FILES}

Or convert IGNORED_FILES to a set of lowercase names and compare filename.lower().

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@sidecar/src/modelexpress_sidecar/services/downloader.py` around lines 40 -
54, The is_ignored_file function currently does a case-sensitive comparison
against IGNORED_FILES while is_weight_file and is_image_file lowercase the
filename; update is_ignored_file to perform a case-insensitive check by
lowercasing the input filename (filename.lower()) before membership testing or
by ensuring IGNORED_FILES is a set of lowercase names and comparing against
that; modify the is_ignored_file function (and adjust IGNORED_FILES
initialization if you choose the latter) so filename case is normalized
consistently with is_weight_file and is_image_file.

218-255: _download_with_model_streamer is a no-op wrapper — both branches do the same thing.

The SafetensorsStreamer import on line 234 is unused; the try and except blocks both delegate to _download_with_boto3. The method name is misleading since Model Streamer is never actually invoked. Consider either:

  • Removing the method entirely and calling _download_with_boto3 directly from download(), with a TODO for phase 2, or
  • At minimum, adding a clear log message in the try branch indicating Model Streamer was found but is not yet used.

This avoids confusion for future maintainers who may assume the streamer path is functional.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@sidecar/src/modelexpress_sidecar/services/downloader.py` around lines 218 -
255, The _download_with_model_streamer function currently never uses the
imported SafetensorsStreamer and simply calls _download_with_boto3 in both try
and except; either remove this wrapper and call _download_with_boto3 directly
from download(), leaving a TODO for phase 2, or keep the function but make its
intent explicit by, within the try branch after importing SafetensorsStreamer,
logging that the streamer was detected but not yet implemented (e.g.,
"SafetensorsStreamer available but streamer-based download not implemented;
falling back to boto3") before calling _download_with_boto3; update any callers
(e.g., download()) accordingly to reflect the chosen approach.
modelexpress_common/src/providers/model_streamer.rs (2)

149-165: encode_model_id — straightforward replace may mangle URIs with overlapping scheme substrings.

The replace("s3://", "s3/").replace("gs://", "gs/") approach on line 162-164 applies globally. While extremely unlikely in practice, a URI like gs://bucket/s3://path would be double-mangled. The s3+http/https branch correctly uses strip_prefix — consider using the same approach for the simple schemes for consistency.

Optional: use strip_prefix for all schemes
-        model_path
-            .replace("s3://", "s3/")
-            .replace("gs://", "gs/")
+        if let Some(rest) = model_path.strip_prefix("s3://") {
+            format!("s3/{rest}")
+        } else if let Some(rest) = model_path.strip_prefix("gs://") {
+            format!("gs/{rest}")
+        } else {
+            model_path.to_string()
+        }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelexpress_common/src/providers/model_streamer.rs` around lines 149 - 165,
The current global replace can mangle URIs containing scheme substrings; update
encode_model_id to use strip_prefix for the simple schemes too: check
model_path.strip_prefix("s3://") and if Some(rest) return format!("s3/{}", rest)
(return "s3/" if rest is empty), then check strip_prefix("gs://") and return
format!("gs/{}", rest) similarly, falling back to the original model_path only
if none of the prefixes match; keep the existing s3+http(s) branch untouched.

167-183: Gate is_weight_file and is_ignored with #[cfg(test)].

These helpers are unused in production code—they're only called in test_is_weight_file and test_is_ignored. Since they duplicate filtering logic already handled by the Python sidecar and serve no purpose outside tests, wrap their definitions with #[cfg(test)] to remove dead code from release builds.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelexpress_common/src/providers/model_streamer.rs` around lines 167 - 183,
The is_weight_file and is_ignored helpers (and their associated constants
WEIGHT_EXTENSIONS and IGNORED_FILES) are only used by tests test_is_weight_file
and test_is_ignored, so annotate them with #[cfg(test)] to prevent them and the
constants from being compiled into release builds; locate the const declarations
WEIGHT_EXTENSIONS and IGNORED_FILES and the fn definitions is_weight_file and
is_ignored and add #[cfg(test)] above each item (or group them under a
#[cfg(test)] mod) so they are only available during testing.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.github/workflows/ci.yml:
- Around line 216-218: The release job currently lists needs = [test,
integration-test, security-audit, docker-build] but omits the sidecar-test job;
update the release job's needs array to include "sidecar-test" (i.e., needs =
[test, integration-test, security-audit, docker-build, sidecar-test]) so the
release will be gated by the Sidecar Python Tests when the sidecar is part of
the release artifact; if the sidecar is optional, instead add a conditional
dependency or gating check in the release job to only require "sidecar-test"
when the sidecar is included.

In `@docker-compose.yml`:
- Around line 51-63: The minio-init service hardcodes credentials in the mc
alias set command and uses sleep 5; update minio-init to read MINIO_ROOT_USER
and MINIO_ROOT_PASSWORD environment variables (same names used by the minio
service) when running mc alias set instead of literal "minioadmin minioadmin",
and change the depends_on to use the health condition so minio-init waits for
the minio service healthcheck rather than a fixed sleep; locate the minio-init
service and the mc alias set invocation to implement these changes and ensure
the env var names match the minio service healthcheck usage.

In `@Dockerfile`:
- Around line 39-40: The Dockerfile sets PYTHONPATH=/app/sidecar/src while also
installing the sidecar package into the virtualenv (/app/venv) and using
/app/venv/bin/python in entrypoint.sh; remove the redundant PYTHONPATH
assignment (or add a concise comment) so the container relies on the venv's
site-packages instead—update the PYTHONPATH usage near the ENV or entrypoint
lines or document why PYTHONPATH=/app/sidecar/src is required (e.g., for
unpackaged development-only modules), and ensure entrypoint.sh continues to
invoke /app/venv/bin/python and the installed package name so behavior remains
identical.

In `@entrypoint.sh`:
- Around line 42-53: The current use of exec when launching
/app/modelexpress-server replaces the shell so the cleanup() trap never runs and
the sidecar (SIDECAR_PID) can be orphaned; change to not use exec: launch
/app/modelexpress-server without exec, capture its PID (e.g., SERVER_PID), and
use the existing trap/cleanup to kill both SIDECAR_PID and SERVER_PID on
EXIT/INT/TERM, and also forward signals to the server (kill -SIGTERM
"$SERVER_PID") so the shell can wait (wait "$SERVER_PID") and still perform
cleanup; update references to the launch line, the cleanup() function, trap,
SIDECAR_PID, and the server start logic to implement this behavior.

In `@modelexpress_common/src/config.rs`:
- Around line 456-464: Remove the auto-derived Debug on GcsCredentials and add a
custom fmt::Debug impl that redacts sensitive content in the credentials_json
field (e.g., display None or Some("<redacted>") instead of the actual JSON),
while still showing non-sensitive fields like credentials_file; mirror the
approach used for S3Credentials (keep #[derive(Clone, Serialize, Deserialize,
Default)] but replace #[derive(Debug)] with a manual impl Debug for
GcsCredentials that prints credentials_file and a redacted placeholder for
credentials_json.
- Around line 421-434: S3Credentials currently derives Debug and will expose
sensitive fields; replace the derived Debug with a manual impl for the
S3Credentials struct that redacts access_key_id and secret_access_key (e.g.,
show "<redacted>" or masked value) while still showing non-sensitive fields like
region and endpoint, and remove Debug from the derive list; implement fmt::Debug
for S3Credentials so any logging (tracing::debug!, dbg!(), etc.) uses the
redacted representation.

In `@modelexpress_common/src/download.rs`:
- Around line 119-123: The test test_get_provider_model_streamer relies on
ModelStreamerProvider::from_env which reads environment variables; to isolate
it, clear or set the relevant env vars before calling
get_provider(ModelProvider::ModelStreamer) so external env won't affect the
result — e.g., remove or set vars consumed by SidecarConfig::from_env,
S3Credentials::from_env and GcsCredentials::from_env (or set them to
known-default values) and restore them after the test; alternatively wrap the
test with a helper that sets a clean env for ModelStreamerProvider::from_env to
ensure deterministic behavior.
- Around line 17-31: Change get_provider to return a Result<Box<dyn
ModelProviderTrait>, E> instead of panicking: have get_provider accept
ModelProvider and return Ok(Box::new(HuggingFaceProvider)) for
ModelProvider::HuggingFace and, for ModelProvider::ModelStreamer, call
ModelStreamerProvider::from_env() and propagate its Err (wrap or convert into
your chosen error type) instead of tracing::error! + panic!. Update all callers
(e.g., download_model which already returns Result, and stream_model_files /
list_model_files in services.rs) to use ? to propagate the error or map it to a
tonic::Status (e.g., .map_err(|e| Status::internal(...))) so initialization
failures are handled as errors rather than crashing the process.

In `@modelexpress_common/src/providers/model_streamer.rs`:
- Around line 22-30: The DownloadRequest currently derives Debug and is being
debug-logged, which will expose S3/GCS secrets; remove the automatic Debug
output and either (A) implement a custom Debug for DownloadRequest that redacts
s3_credentials and gcs_credentials (or implements Debug on
S3CredentialsPayload/GcsCredentialsPayload to print "<redacted>"), or (B) stop
logging the whole struct and change the debug call to only log non-sensitive
fields (model_path, cache_dir, ignore_weights) and omit or replace credential
fields with "<redacted>"; update the Debug implementations for
S3CredentialsPayload/GcsCredentialsPayload if you choose centralized redaction.

In `@sidecar/pyproject.toml`:
- Around line 40-49: Update the pytest-asyncio dependency floor to at least
0.24.0 so the pytest.ini option asyncio_default_fixture_loop_scope is supported;
specifically change the pytest-asyncio entry in the dependency list (the string
"pytest-asyncio>=0.21") to "pytest-asyncio>=0.24.0" in the pyproject.toml
dependencies section to avoid configuration warnings/errors.

In `@sidecar/src/modelexpress_sidecar/api/routes.py`:
- Around line 58-68: The except blocks in the get_model and delete_model
endpoints are re-raising HTTPException without chaining the original exception,
which loses traceback context; update the two places that call raise
HTTPException(...) (the handler around downloader.get_model in the get_model
route and the corresponding except in delete_model) to use "raise
HTTPException(...) from e" (i.e., re-raise the HTTPException with the original
exception variable) so the original exception chain is preserved for debugging.

In `@sidecar/src/modelexpress_sidecar/services/downloader.py`:
- Around line 257-321: The _download_with_boto3 method can leave partial files
if s3_client.download_file fails; update the loop that calls
s3_client.download_file to wrap each per-file download in try/except: call
s3_client.download_file inside a try, and on Exception (catch broad Exception)
delete the local_file if it exists (to remove partial/incomplete files), log the
failure with the key/rel_path and exception details, then re-raise the exception
so the existing caller behavior in download() remains unchanged; reference the
symbols _download_with_boto3, s3_client.download_file, local_file and download()
when making this change.
- Around line 109-146: The current approach mutates process-global os.environ
via _setup_credentials and _restore_credentials (and relies on an asyncio.Lock)
which is fragile, serializes downloads, and mishandles empty-string env values;
instead, stop touching os.environ: remove or stop calling _setup_credentials and
_restore_credentials, and construct the storage clients with explicit
credentials (for S3 pass aws_access_key_id, aws_secret_access_key, region_name
and endpoint_url to boto3.client; for GCS supply the credentials file or
credentials object to the GCS client constructor). Also remove the
credential-locking around downloads so downloads can run concurrently; if you
must keep restore code, fix the restore logic in _restore_credentials to
distinguish None from an empty-string (i.e., store presence and original value
explicitly rather than using if value) but prefer the direct-client-credentials
approach to eliminate env mutation and the lock entirely.
- Around line 96-107: The _resolve_model_id implementation allows path-traversal
because it naively concatenates user-controlled model_id parts into a Path;
change it to first reject absolute paths and any path segments containing ".."
or null bytes, then build the candidate path using cache_base / CACHE_SUBDIR /
scheme / bucket / path and call .resolve() on both cache_base/CACHE_SUBDIR and
the candidate, finally verify the resolved candidate is a subpath of the
resolved cache root (raise an exception if not). Apply the same containment
check before performing destructive operations in delete_model and also before
reading in get_model; reference the functions _resolve_model_id, delete_model,
get_model, default_cache_dir, and CACHE_SUBDIR when locating and updating the
code.

In `@sidecar/tests/test_s3_integration.py`:
- Around line 148-153: The test test_download_nonexistent_bucket currently only
checks that the "success" key exists, which is too weak; update the assertion to
validate the expected outcome (e.g., assert data["success"] is False if the
intended behavior is to report failure) or, if behavior can vary, assert that
either data["success"] is False or an "error" key is present (e.g., assert not
data.get("success") or "error" in data) and add a short comment documenting the
chosen expected S3 behavior so future readers know why this check is used.

---

Nitpick comments:
In @.github/workflows/ci.yml:
- Around line 239-254: Update the "Start MinIO for integration tests" step to
pin the MinIO and mc images instead of using minio/minio:latest and
minio/mc:latest (choose a specific release tag like RELEASE.x) and replace the
fragile sleep 5 wait with a readiness loop that polls MinIO's health endpoint
(http://localhost:9000/minio/health/live) until healthy before creating buckets
and uploading test data; keep the existing docker run and mc commands but
reference the pinned image tags and wait for the health check to succeed before
running the mc container.

In @.gitignore:
- Around line 53-58: The .gitignore currently covers .venv/, __pycache__/,
*.pyc, *.pyo, and *.egg-info/ but should also include other common Python
artifacts; update the ignore entries to add patterns such as venv/, env/, .env,
.Python, .pytest_cache/, .mypy_cache/, build/, dist/, pip-wheel-metadata/ and
*.egg to ensure virtual environments, caches, build artifacts, and wheel
metadata (the same area as the existing .venv/ and *.egg-info/ entries) are
excluded.

In `@Dockerfile`:
- Around line 43-46: The Dockerfile is copying test artifacts into the
production runtime image (COPY --from=builder /app/target/release/test_client
and /fallback_test); remove those two COPY lines or move them into a separate
multi-stage "test" stage so only production binaries (modelexpress-server and
modelexpress-cli) are included in the final runtime image; ensure the final
stage only copies /app/target/release/modelexpress-server and modelexpress-cli
and, if needed, add a dedicated test stage that copies test_client and
fallback_test for CI/test images instead of the production image.

In `@entrypoint.sh`:
- Around line 29-36: The health-check loop currently spawns a full Python
interpreter each retry by calling /app/venv/bin/python -c "import httpx;
httpx.get(...)" which is slow; replace this by either (a) using a lightweight
shell tool if available (curl or wget) to hit
http://127.0.0.1:${SIDECAR_PORT}/health inside the loop, or (b) create a small
persistent Python helper that performs the retry loop internally (so only one
interpreter starts) and call that instead; update references to RETRY_COUNT and
MAX_RETRIES to match the new approach and ensure the same success/failure
behavior and exit conditions are preserved.

In `@modelexpress_common/src/config.rs`:
- Around line 383-401: Extract the duplicated default values into constants and
use them from both the Default impl and from_env(): define e.g. const
DEFAULT_SIDECAR_ENDPOINT: &str = "http://127.0.0.1:8002"; and const
DEFAULT_SIDECAR_TIMEOUT: u64 = 300, then replace the literal in impl Default for
SidecarConfig::default() (endpoint and timeout_secs) and the fallbacks in
SidecarConfig::from_env() (the unwrap_or_else endpoint and unwrap_or timeout) to
reference DEFAULT_SIDECAR_ENDPOINT and DEFAULT_SIDECAR_TIMEOUT so the values
remain single-sourced.

In `@modelexpress_common/src/providers/model_streamer.rs`:
- Around line 149-165: The current global replace can mangle URIs containing
scheme substrings; update encode_model_id to use strip_prefix for the simple
schemes too: check model_path.strip_prefix("s3://") and if Some(rest) return
format!("s3/{}", rest) (return "s3/" if rest is empty), then check
strip_prefix("gs://") and return format!("gs/{}", rest) similarly, falling back
to the original model_path only if none of the prefixes match; keep the existing
s3+http(s) branch untouched.
- Around line 167-183: The is_weight_file and is_ignored helpers (and their
associated constants WEIGHT_EXTENSIONS and IGNORED_FILES) are only used by tests
test_is_weight_file and test_is_ignored, so annotate them with #[cfg(test)] to
prevent them and the constants from being compiled into release builds; locate
the const declarations WEIGHT_EXTENSIONS and IGNORED_FILES and the fn
definitions is_weight_file and is_ignored and add #[cfg(test)] above each item
(or group them under a #[cfg(test)] mod) so they are only available during
testing.

In `@modelexpress_server/src/database.rs`:
- Around line 108-112: The provider string/enum conversion logic is duplicated
across get_model_record, get_models_by_last_used, set_status,
try_claim_for_download and get_status; extract this into centralized conversions
by implementing std::str::FromStr for ModelProvider (to parse provider_str →
ModelProvider) and std::fmt::Display (or From<ModelProvider> for &str /
ToString) for enum→string, update those functions to use
provider_str.parse::<ModelProvider>() and provider.to_string() (or .to_string())
instead of repeating match arms, and remove the duplicated match blocks so
adding new variants only requires updating ModelProvider's impls.

In `@sidecar/src/modelexpress_sidecar/api/schemas.py`:
- Around line 11-19: The S3Credentials.schema defines region as Optional[str]
with a default "us-east-1", allowing callers to explicitly pass None which
overrides the default; decide whether None is meaningful and then make the type
match: if region must always have a value, change the type on the region field
in S3Credentials from Optional[str] to str so callers cannot pass null and the
default is enforced, otherwise keep Optional[str] but add a comment or
validation in S3Credentials (e.g., in a validator) clarifying that None is
acceptable for S3-compatible endpoints.

In `@sidecar/src/modelexpress_sidecar/services/downloader.py`:
- Around line 40-54: The is_ignored_file function currently does a
case-sensitive comparison against IGNORED_FILES while is_weight_file and
is_image_file lowercase the filename; update is_ignored_file to perform a
case-insensitive check by lowercasing the input filename (filename.lower())
before membership testing or by ensuring IGNORED_FILES is a set of lowercase
names and comparing against that; modify the is_ignored_file function (and
adjust IGNORED_FILES initialization if you choose the latter) so filename case
is normalized consistently with is_weight_file and is_image_file.
- Around line 218-255: The _download_with_model_streamer function currently
never uses the imported SafetensorsStreamer and simply calls
_download_with_boto3 in both try and except; either remove this wrapper and call
_download_with_boto3 directly from download(), leaving a TODO for phase 2, or
keep the function but make its intent explicit by, within the try branch after
importing SafetensorsStreamer, logging that the streamer was detected but not
yet implemented (e.g., "SafetensorsStreamer available but streamer-based
download not implemented; falling back to boto3") before calling
_download_with_boto3; update any callers (e.g., download()) accordingly to
reflect the chosen approach.

In `@sidecar/tests/test_s3_integration.py`:
- Around line 56-67: Extract the duplicated s3_credentials dict and the repeated
download request payload into a shared fixture or helper used by
test_download_from_minio and test_get_downloaded_model: create a pytest fixture
(e.g., s3_credentials or make_s3_payload) that returns the dict with
access_key_id, secret_access_key, region, and endpoint (pulling from os.environ
fallbacks) and optionally a function that builds the POST JSON including
model_path and cache_dir, then replace the inline dicts in
test_download_from_minio and test_get_downloaded_model with calls to that
fixture/helper to remove duplication and centralize credential setup.

@nicolasnoble nicolasnoble force-pushed the nnoble/model-streamer-provider branch 2 times, most recently from a772438 to b230fe1 Compare February 18, 2026 23:51
@nicolasnoble
Copy link
Contributor Author

Replying to a few threads below and resolving the ones that are fine to defer.

nicolasnoble added a commit that referenced this pull request Feb 19, 2026
- Redact secrets in Debug output for S3Credentials, GcsCredentials, and
  DownloadRequest so credentials are never leaked in logs
- Fix entrypoint.sh: drop exec so cleanup trap can fire and kill the sidecar
- Chain exceptions in routes.py (raise from err) to preserve tracebacks
- Guard against path traversal in _resolve_model_id
- Bump pytest-asyncio floor to >=0.24 (asyncio_default_fixture_loop_scope
  was introduced in 0.24)
- Strengthen test_download_nonexistent_bucket assertion

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
nicolasnoble added a commit that referenced this pull request Feb 19, 2026
- Redact secrets in Debug output for S3Credentials, GcsCredentials, and
  DownloadRequest so credentials are never leaked in logs
- Fix entrypoint.sh: drop exec so cleanup trap can fire and kill the sidecar
- Chain exceptions in routes.py (raise from err) to preserve tracebacks
- Guard against path traversal in _resolve_model_id
- Bump pytest-asyncio floor to >=0.24 (asyncio_default_fixture_loop_scope
  was introduced in 0.24)
- Strengthen test_download_nonexistent_bucket assertion

Signed-off-by: Nicolas Pixel Noble <pixel@nobis-crew.org>
@nicolasnoble nicolasnoble force-pushed the nnoble/model-streamer-provider branch from cb50a5c to b06121a Compare February 19, 2026 03:19
nicolasnoble added a commit that referenced this pull request Feb 19, 2026
- Redact secrets in Debug output for S3Credentials, GcsCredentials, and
  DownloadRequest so credentials are never leaked in logs
- Fix entrypoint.sh: drop exec so cleanup trap can fire and kill the sidecar
- Chain exceptions in routes.py (raise from err) to preserve tracebacks
- Guard against path traversal in _resolve_model_id
- Bump pytest-asyncio floor to >=0.24 (asyncio_default_fixture_loop_scope
  was introduced in 0.24)
- Strengthen test_download_nonexistent_bucket assertion

Signed-off-by: Nicolas 'Pixel' Noble <nicolas@nobis-crew.org>
@nicolasnoble nicolasnoble force-pushed the nnoble/model-streamer-provider branch from b06121a to dd930bf Compare February 19, 2026 03:23
nicolasnoble added a commit that referenced this pull request Feb 19, 2026
- Redact secrets in Debug output for S3Credentials, GcsCredentials, and
  DownloadRequest so credentials are never leaked in logs
- Fix entrypoint.sh: drop exec so cleanup trap can fire and kill the sidecar
- Chain exceptions in routes.py (raise from err) to preserve tracebacks
- Guard against path traversal in _resolve_model_id
- Bump pytest-asyncio floor to >=0.24 (asyncio_default_fixture_loop_scope
  was introduced in 0.24)
- Strengthen test_download_nonexistent_bucket assertion

Signed-off-by: Nicolas 'Pixel' Noble <nicolas@nobis-crew.org>
@nicolasnoble nicolasnoble force-pushed the nnoble/model-streamer-provider branch from dd930bf to bc33c3f Compare February 19, 2026 03:24
nicolasnoble added a commit that referenced this pull request Feb 19, 2026
- Redact secrets in Debug output for S3Credentials, GcsCredentials, and
  DownloadRequest so credentials are never leaked in logs
- Fix entrypoint.sh: drop exec so cleanup trap can fire and kill the sidecar
- Chain exceptions in routes.py (raise from err) to preserve tracebacks
- Guard against path traversal in _resolve_model_id
- Bump pytest-asyncio floor to >=0.24 (asyncio_default_fixture_loop_scope
  was introduced in 0.24)
- Strengthen test_download_nonexistent_bucket assertion

Signed-off-by: Nicolas 'Pixel' Noble <nicolas@nobis-crew.org>
@nicolasnoble nicolasnoble force-pushed the nnoble/model-streamer-provider branch from bc33c3f to f022142 Compare February 19, 2026 03:51
nicolasnoble added a commit that referenced this pull request Feb 19, 2026
- Redact secrets in Debug output for S3Credentials, GcsCredentials, and
  DownloadRequest so credentials are never leaked in logs
- Fix entrypoint.sh: drop exec so cleanup trap can fire and kill the sidecar
- Chain exceptions in routes.py (raise from err) to preserve tracebacks
- Guard against path traversal in _resolve_model_id
- Bump pytest-asyncio floor to >=0.24 (asyncio_default_fixture_loop_scope
  was introduced in 0.24)
- Strengthen test_download_nonexistent_bucket assertion

Signed-off-by: Nicolas 'Pixel' Noble <nicolas@nobis-crew.org>
@nicolasnoble nicolasnoble force-pushed the nnoble/model-streamer-provider branch from f022142 to 6bd848b Compare February 19, 2026 03:56
nicolasnoble and others added 13 commits February 19, 2026 15:29
We provide a method to download from S3 using Model Streamer.
Given Model Streamer's API, this ends up as a Python sidecar process for which we add a Rust interface so this becomes completely transparent for the end user, where we only have a new provider that they can use.

Signed-off-by: Nicolas Pixel Noble <pixel@nobis-crew.org>
Signed-off-by: Nicolas Pixel Noble <pixel@nobis-crew.org>
…ect.toml

All dependencies were already declared in pyproject.toml. Updated the
Dockerfile to pip install the sidecar package directly instead of using
the separate requirements.txt.

Signed-off-by: Nicolas Pixel Noble <pixel@nobis-crew.org>
The Pydantic model S3Credentials defines the field as 'endpoint', but
the tests were sending 'endpoint_url' which was silently ignored.

Signed-off-by: Nicolas Pixel Noble <pixel@nobis-crew.org>
The Python sidecar normalizes s3+http/s3+https URIs to scheme "s3" and
strips the endpoint, storing cached models under s3/bucket/path. The
Rust encode_model_id was producing s3-http/endpoint:port/bucket/path
instead, so get_model_path and delete_model would never find the cached
files. Now both sides agree on the same path layout.

Signed-off-by: Nicolas Pixel Noble <pixel@nobis-crew.org>
Added the missing associated functions so the existing tests compile.
The extension/file lists match the Python sidecar definitions.

Signed-off-by: Nicolas Pixel Noble <pixel@nobis-crew.org>
Concurrent requests could overwrite each other's AWS/GCS environment
variables. Added a lock around the setup/download/restore cycle so
only one request mutates os.environ at a time.

Signed-off-by: Nicolas Pixel Noble <pixel@nobis-crew.org>
_download_with_boto3 uses synchronous boto3 calls (paginator,
download_file) which would block the async event loop. Changed it to a
regular sync method and wrapped the calls with asyncio.to_thread so
the FastAPI server stays responsive during downloads.

Signed-off-by: Nicolas Pixel Noble <pixel@nobis-crew.org>
is_weight_file was missing the case-insensitive check entirely, and
is_image_file was calling .lower() on every iteration. Both now
lowercase the filename once upfront.

Signed-off-by: Nicolas Pixel Noble <pixel@nobis-crew.org>
delete_model was hardcoding self.default_cache_dir while get_model
accepted a custom cache_dir. Now both behave consistently - the route
and downloader accept an optional cache_dir, falling back to the
default when not provided.

Signed-off-by: Nicolas Pixel Noble <pixel@nobis-crew.org>
- Extract "model-streamer" to a CACHE_SUBDIR constant
- Simplify MinIO URI parsing with str.partition
- Use PurePosixPath for S3 key filename extraction
- Consolidate duplicate model_id -> cache path logic into
  _resolve_model_id, used by both get_model and delete_model

Signed-off-by: Nicolas Pixel Noble <pixel@nobis-crew.org>
The exists() check was only applied during size calculation but the
unfiltered list was returned in the response. Now we filter once and
use the same list for both the size sum and the response payload.

Signed-off-by: Nicolas Pixel Noble <pixel@nobis-crew.org>
- Redact secrets in Debug output for S3Credentials, GcsCredentials, and
  DownloadRequest so credentials are never leaked in logs
- Fix entrypoint.sh: drop exec so cleanup trap can fire and kill the sidecar
- Chain exceptions in routes.py (raise from err) to preserve tracebacks
- Guard against path traversal in _resolve_model_id
- Bump pytest-asyncio floor to >=0.24 (asyncio_default_fixture_loop_scope
  was introduced in 0.24)
- Strengthen test_download_nonexistent_bucket assertion

Signed-off-by: Nicolas 'Pixel' Noble <nicolas@nobis-crew.org>
Signed-off-by: Nicolas 'Pixel' Noble <nicolas@nobis-crew.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants