Conversation
|
Also mentioning @noa-neria here for the Model Streamer portion. |
There was a problem hiding this comment.
Pull request overview
This PR introduces a Python sidecar service that integrates with Run:ai Model Streamer to download models from S3/GCS (including MinIO), wires a new ModelStreamer provider into the Rust backend, and updates container/CI infrastructure to run both the main server and the sidecar with MinIO-based tests.
Changes:
- Add a FastAPI-based sidecar (
modelexpress_sidecar) with download/list/delete APIs backed by S3/GCS via Model Streamer (with a boto3 fallback) plus Python unit/integration tests using MinIO. - Introduce a new
ModelProvider::ModelStreamerin the Rust common crate, implement aModelStreamerProviderthat talks to the sidecar over HTTP, and map it through gRPC/database layers. - Update Dockerfile, docker-compose, entrypoint, workspace dependencies, and CI to build and run the sidecar alongside the gRPC server and to execute the new Python tests in CI.
Reviewed changes
Copilot reviewed 32 out of 34 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| sidecar/tests/test_s3_integration.py | Adds async integration tests hitting the sidecar’s /api/v1/download and /api/v1/models/... endpoints against a MinIO-backed S3-compatible storage. |
| sidecar/tests/test_api.py | Adds basic API tests for the sidecar (/health, model GET/DELETE 404 behavior, invalid download URI behavior). |
| sidecar/tests/init.py | Marks the sidecar tests package and adds SPDX/license header. |
| sidecar/src/modelexpress_sidecar/services/downloader.py | Implements ModelDownloader with URI parsing, environment-based S3/GCS credential setup, and S3 downloads via Model Streamer (with boto3 fallback), plus cache lookup and deletion. |
| sidecar/src/modelexpress_sidecar/services/init.py | Declares the services package for the sidecar. |
| sidecar/src/modelexpress_sidecar/main.py | Defines the FastAPI application, logging setup, lifespan hooks, /health endpoint, and CLI entrypoint for running the sidecar with uvicorn. |
| sidecar/src/modelexpress_sidecar/api/schemas.py | Introduces Pydantic models for S3/GCS credentials and download/get/delete responses for the sidecar API. |
| sidecar/src/modelexpress_sidecar/api/routes.py | Wires HTTP routes (/download, /models/{model_id} GET/DELETE) to the ModelDownloader`, including error handling and HTTP status mapping. |
| sidecar/src/modelexpress_sidecar/api/init.py | Declares the API subpackage for the sidecar. |
| sidecar/src/modelexpress_sidecar/init.py | Declares the top-level sidecar package and version. |
| sidecar/src/modelexpress_sidecar.egg-info/top_level.txt | Records the top-level Python package name for distribution metadata. |
| sidecar/src/modelexpress_sidecar.egg-info/requires.txt | Lists Python runtime and dev dependencies (FastAPI, Pydantic, Model Streamer, boto3, pytest, httpx) for the sidecar package. |
| sidecar/src/modelexpress_sidecar.egg-info/dependency_links.txt | Placeholder for extra dependency links in the sidecar package metadata. |
| sidecar/src/modelexpress_sidecar.egg-info/SOURCES.txt | Enumerates packaged Python source and test files for the sidecar distribution. |
| sidecar/src/modelexpress_sidecar.egg-info/PKG-INFO | Provides the generated package metadata (name, version, requirements, classifiers) for the sidecar. |
| sidecar/requirements.txt | Defines Python runtime and dev dependencies for the sidecar (including httpx used by the entrypoint health check). |
| sidecar/pyproject.toml | Declares the modelexpress-sidecar Python project configuration, dependencies, extras, and pytest settings. |
| modelexpress_server/src/database.rs | Extends serialization/deserialization of the ModelProvider field to understand and persist the new ModelStreamer provider. |
| modelexpress_common/src/providers/model_streamer.rs | Implements ModelStreamerProvider using reqwest to call the sidecar’s download/get/delete APIs, including health check helpers and unit tests. |
| modelexpress_common/src/providers.rs | Registers the new model_streamer module and re-exports ModelStreamerProvider in the providers module. |
| modelexpress_common/src/models.rs | Extends the ModelProvider enum with a ModelStreamer variant and adds tests for serialization. |
| modelexpress_common/src/lib.rs | Maps the extended ModelProvider enum to and from the new gRPC MODEL_STREAMER provider value. |
| modelexpress_common/src/download.rs | Updates provider factory to construct ModelStreamerProvider from environment and adds tests to verify the new provider wiring. |
| modelexpress_common/src/config.rs | Adds SidecarConfig, S3Credentials, and GcsCredentials structs plus env-loaders and tests for the Model Streamer integration. |
| modelexpress_common/proto/model.proto | Extends the ModelProvider protobuf enum with MODEL_STREAMER. |
| modelexpress_common/Cargo.toml | Adds reqwest and urlencoding as workspace dependencies used by the Model Streamer provider. |
| entrypoint.sh | Introduces an entrypoint that starts the Python sidecar in a venv, waits (via httpx) for its /health endpoint, and then starts the Rust ModelExpress server, ensuring cleanup on exit. |
| docker-compose.yml | Updates service ports and environment variables, adds MinIO and a MinIO init sidecar, and mounts a shared model cache volume for the server container. |
| Dockerfile | Installs Python and a venv for the sidecar, installs sidecar dependencies, copies sidecar sources, exposes ports for the gRPC server and sidecar, and switches to using the new entrypoint script. |
| Cargo.toml | Adds workspace-level reqwest (with JSON + rustls-tls) and urlencoding dependencies for common code. |
| Cargo.lock | Updates lockfile to include reqwest and urlencoding (and associated transitive dependencies). |
| .gitignore | Ignores Python virtual environments (e.g., .venv/) used during testing or local development. |
| .github/workflows/ci.yml | Adds a sidecar-test job that installs the sidecar with dev extras, runs unit tests, starts a MinIO container, seeds test data, and runs the full sidecar test suite against MinIO. |
| .dockerignore | Expands Docker build context ignore rules to exclude Rust target artifacts and Python virtual envs/bytecode, reducing build context size. |
|
A couple of items from the review feedback that we're deferring to a follow-up PR:
|
WalkthroughThis PR integrates a Python-based Model Streamer sidecar service into ModelExpress to support downloading models from S3, GCS, and MinIO storage. It adds a new FastAPI sidecar service, updates Docker configurations to run both services, extends the Rust codebase with ModelStreamer provider support and related configuration structures, and introduces CI workflows for sidecar testing with MinIO. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~70 minutes Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Tip Issue Planner is now in beta. Read the docs and try it out! Share your feedback on Discord. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 15
🧹 Nitpick comments (12)
.gitignore (1)
53-58: LGTM! Consider adding a few more commonly generated Python paths.The added entries are all correct. As an optional follow-up, the following patterns are commonly generated by Python toolchains and may appear as the sidecar matures:
# Python .venv/ __pycache__/ *.pyc *.pyo *.egg-info/ +dist/ +build/ +.pytest_cache/🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.gitignore around lines 53 - 58, The .gitignore currently covers .venv/, __pycache__/, *.pyc, *.pyo, and *.egg-info/ but should also include other common Python artifacts; update the ignore entries to add patterns such as venv/, env/, .env, .Python, .pytest_cache/, .mypy_cache/, build/, dist/, pip-wheel-metadata/ and *.egg to ensure virtual environments, caches, build artifacts, and wheel metadata (the same area as the existing .venv/ and *.egg-info/ entries) are excluded.sidecar/src/modelexpress_sidecar/api/schemas.py (1)
11-19: Consider whetherregionshould beOptional.
regiondefaults to"us-east-1"but is typedOptional[str], which means a caller can explicitly passnullto set it toNone, overriding the default. IfNoneis a valid value (e.g., for MinIO/S3-compatible endpoints that don't use regions), this is fine. Otherwise, changing the type tostrwould prevent accidentalNonevalues.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@sidecar/src/modelexpress_sidecar/api/schemas.py` around lines 11 - 19, The S3Credentials.schema defines region as Optional[str] with a default "us-east-1", allowing callers to explicitly pass None which overrides the default; decide whether None is meaningful and then make the type match: if region must always have a value, change the type on the region field in S3Credentials from Optional[str] to str so callers cannot pass null and the default is enforced, otherwise keep Optional[str] but add a comment or validation in S3Credentials (e.g., in a validator) clarifying that None is acceptable for S3-compatible endpoints.modelexpress_common/src/config.rs (1)
383-401: Default endpoint and timeout values are duplicated betweenDefaultandfrom_env().The literal
"http://127.0.0.1:8002"appears in bothDefault::default()(line 386) andfrom_env()(line 396), and300appears in both (lines 387 and 401). Extract these into constants to keep them in sync.Proposed fix
+const DEFAULT_SIDECAR_ENDPOINT: &str = "http://127.0.0.1:8002"; +const DEFAULT_SIDECAR_TIMEOUT_SECS: u64 = 300; + impl Default for SidecarConfig { fn default() -> Self { Self { - endpoint: "http://127.0.0.1:8002".to_string(), - timeout_secs: 300, + endpoint: DEFAULT_SIDECAR_ENDPOINT.to_string(), + timeout_secs: DEFAULT_SIDECAR_TIMEOUT_SECS, } } } impl SidecarConfig { pub fn from_env() -> Self { let endpoint = std::env::var("MODEL_EXPRESS_SIDECAR_ENDPOINT") - .unwrap_or_else(|_| "http://127.0.0.1:8002".to_string()); + .unwrap_or_else(|_| DEFAULT_SIDECAR_ENDPOINT.to_string()); let timeout_secs = std::env::var("MODEL_EXPRESS_SIDECAR_TIMEOUT") .ok() .and_then(|s| s.parse().ok()) - .unwrap_or(300); + .unwrap_or(DEFAULT_SIDECAR_TIMEOUT_SECS);🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@modelexpress_common/src/config.rs` around lines 383 - 401, Extract the duplicated default values into constants and use them from both the Default impl and from_env(): define e.g. const DEFAULT_SIDECAR_ENDPOINT: &str = "http://127.0.0.1:8002"; and const DEFAULT_SIDECAR_TIMEOUT: u64 = 300, then replace the literal in impl Default for SidecarConfig::default() (endpoint and timeout_secs) and the fallbacks in SidecarConfig::from_env() (the unwrap_or_else endpoint and unwrap_or timeout) to reference DEFAULT_SIDECAR_ENDPOINT and DEFAULT_SIDECAR_TIMEOUT so the values remain single-sourced.entrypoint.sh (1)
29-36: Health check spawns a full Python process on every retry.Each iteration invokes
/app/venv/bin/python -c "import httpx; ...", which has nontrivial startup cost. Ifcurlorwgetis available in the container image, that would be significantly lighter. If not, consider a single Python script that loops internally.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@entrypoint.sh` around lines 29 - 36, The health-check loop currently spawns a full Python interpreter each retry by calling /app/venv/bin/python -c "import httpx; httpx.get(...)" which is slow; replace this by either (a) using a lightweight shell tool if available (curl or wget) to hit http://127.0.0.1:${SIDECAR_PORT}/health inside the loop, or (b) create a small persistent Python helper that performs the retry loop internally (so only one interpreter starts) and call that instead; update references to RETRY_COUNT and MAX_RETRIES to match the new approach and ensure the same success/failure behavior and exit conditions are preserved..github/workflows/ci.yml (1)
239-254: Pin the MinIO image tag for CI reproducibility.Using
minio/minio:latestmeans CI behavior can change without any code change. Consider pinning to a specific release (e.g.,minio/minio:RELEASE.2025-01-20T14-49-07Z) and bumping it intentionally. The same applies tominio/mc.Also,
sleep 5for readiness is fragile on slow CI runners. A health-check loop (e.g., pollinghttp://localhost:9000/minio/health/live) would be more reliable.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.github/workflows/ci.yml around lines 239 - 254, Update the "Start MinIO for integration tests" step to pin the MinIO and mc images instead of using minio/minio:latest and minio/mc:latest (choose a specific release tag like RELEASE.x) and replace the fragile sleep 5 wait with a readiness loop that polls MinIO's health endpoint (http://localhost:9000/minio/health/live) until healthy before creating buckets and uploading test data; keep the existing docker run and mc commands but reference the pinned image tags and wait for the health check to succeed before running the mc container.sidecar/tests/test_s3_integration.py (1)
56-67: Extract duplicated S3 credential setup into a fixture or helper.The credential dict (
s3_credentialswithaccess_key_id,secret_access_key,region,endpoint) and the download request payload are duplicated acrosstest_download_from_minioandtest_get_downloaded_model. Consider a shared fixture or helper to reduce duplication.Also applies to: 85-97
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@sidecar/tests/test_s3_integration.py` around lines 56 - 67, Extract the duplicated s3_credentials dict and the repeated download request payload into a shared fixture or helper used by test_download_from_minio and test_get_downloaded_model: create a pytest fixture (e.g., s3_credentials or make_s3_payload) that returns the dict with access_key_id, secret_access_key, region, and endpoint (pulling from os.environ fallbacks) and optionally a function that builds the POST JSON including model_path and cache_dir, then replace the inline dicts in test_download_from_minio and test_get_downloaded_model with calls to that fixture/helper to remove duplication and centralize credential setup.Dockerfile (1)
43-46: Test binaries included in runtime image.
test_clientandfallback_testare copied into the production runtime image. Consider using a separate test stage or removing these from the runtime image to reduce its size and attack surface.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@Dockerfile` around lines 43 - 46, The Dockerfile is copying test artifacts into the production runtime image (COPY --from=builder /app/target/release/test_client and /fallback_test); remove those two COPY lines or move them into a separate multi-stage "test" stage so only production binaries (modelexpress-server and modelexpress-cli) are included in the final runtime image; ensure the final stage only copies /app/target/release/modelexpress-server and modelexpress-cli and, if needed, add a dedicated test stage that copies test_client and fallback_test for CI/test images instead of the production image.modelexpress_server/src/database.rs (1)
108-112: Consider extracting provider string conversion to a shared helper.The provider string-to-enum and enum-to-string mappings are duplicated across
get_model_record,get_models_by_last_used,set_status,try_claim_for_download, andget_status. As more providers are added, each new variant must be patched in multiple places. Centralizing this (e.g.,impl Display/FromStronModelProvider) would reduce maintenance burden.Also applies to: 240-243
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@modelexpress_server/src/database.rs` around lines 108 - 112, The provider string/enum conversion logic is duplicated across get_model_record, get_models_by_last_used, set_status, try_claim_for_download and get_status; extract this into centralized conversions by implementing std::str::FromStr for ModelProvider (to parse provider_str → ModelProvider) and std::fmt::Display (or From<ModelProvider> for &str / ToString) for enum→string, update those functions to use provider_str.parse::<ModelProvider>() and provider.to_string() (or .to_string()) instead of repeating match arms, and remove the duplicated match blocks so adding new variants only requires updating ModelProvider's impls.sidecar/src/modelexpress_sidecar/services/downloader.py (2)
40-54: Inconsistent case handling inis_ignored_filevsis_weight_file/is_image_file.
is_weight_fileandis_image_filelowercase the filename before comparison, butis_ignored_filedoes an exact (case-sensitive) match againstIGNORED_FILES. A file namedreadme.mdorREADME.MDwould not be filtered. The Rust side has the same inconsistency, so this is at least consistent across the codebase, but worth noting.Optional fix
def is_ignored_file(filename: str) -> bool: """Check if a file should be ignored.""" - return filename in IGNORED_FILES + return filename.lower() in {f.lower() for f in IGNORED_FILES}Or convert
IGNORED_FILESto a set of lowercase names and comparefilename.lower().🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@sidecar/src/modelexpress_sidecar/services/downloader.py` around lines 40 - 54, The is_ignored_file function currently does a case-sensitive comparison against IGNORED_FILES while is_weight_file and is_image_file lowercase the filename; update is_ignored_file to perform a case-insensitive check by lowercasing the input filename (filename.lower()) before membership testing or by ensuring IGNORED_FILES is a set of lowercase names and comparing against that; modify the is_ignored_file function (and adjust IGNORED_FILES initialization if you choose the latter) so filename case is normalized consistently with is_weight_file and is_image_file.
218-255:_download_with_model_streameris a no-op wrapper — both branches do the same thing.The
SafetensorsStreamerimport on line 234 is unused; the try and except blocks both delegate to_download_with_boto3. The method name is misleading since Model Streamer is never actually invoked. Consider either:
- Removing the method entirely and calling
_download_with_boto3directly fromdownload(), with a TODO for phase 2, or- At minimum, adding a clear log message in the try branch indicating Model Streamer was found but is not yet used.
This avoids confusion for future maintainers who may assume the streamer path is functional.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@sidecar/src/modelexpress_sidecar/services/downloader.py` around lines 218 - 255, The _download_with_model_streamer function currently never uses the imported SafetensorsStreamer and simply calls _download_with_boto3 in both try and except; either remove this wrapper and call _download_with_boto3 directly from download(), leaving a TODO for phase 2, or keep the function but make its intent explicit by, within the try branch after importing SafetensorsStreamer, logging that the streamer was detected but not yet implemented (e.g., "SafetensorsStreamer available but streamer-based download not implemented; falling back to boto3") before calling _download_with_boto3; update any callers (e.g., download()) accordingly to reflect the chosen approach.modelexpress_common/src/providers/model_streamer.rs (2)
149-165:encode_model_id— straightforwardreplacemay mangle URIs with overlapping scheme substrings.The
replace("s3://", "s3/").replace("gs://", "gs/")approach on line 162-164 applies globally. While extremely unlikely in practice, a URI likegs://bucket/s3://pathwould be double-mangled. The s3+http/https branch correctly usesstrip_prefix— consider using the same approach for the simple schemes for consistency.Optional: use strip_prefix for all schemes
- model_path - .replace("s3://", "s3/") - .replace("gs://", "gs/") + if let Some(rest) = model_path.strip_prefix("s3://") { + format!("s3/{rest}") + } else if let Some(rest) = model_path.strip_prefix("gs://") { + format!("gs/{rest}") + } else { + model_path.to_string() + }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@modelexpress_common/src/providers/model_streamer.rs` around lines 149 - 165, The current global replace can mangle URIs containing scheme substrings; update encode_model_id to use strip_prefix for the simple schemes too: check model_path.strip_prefix("s3://") and if Some(rest) return format!("s3/{}", rest) (return "s3/" if rest is empty), then check strip_prefix("gs://") and return format!("gs/{}", rest) similarly, falling back to the original model_path only if none of the prefixes match; keep the existing s3+http(s) branch untouched.
167-183: Gateis_weight_fileandis_ignoredwith#[cfg(test)].These helpers are unused in production code—they're only called in
test_is_weight_fileandtest_is_ignored. Since they duplicate filtering logic already handled by the Python sidecar and serve no purpose outside tests, wrap their definitions with#[cfg(test)]to remove dead code from release builds.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@modelexpress_common/src/providers/model_streamer.rs` around lines 167 - 183, The is_weight_file and is_ignored helpers (and their associated constants WEIGHT_EXTENSIONS and IGNORED_FILES) are only used by tests test_is_weight_file and test_is_ignored, so annotate them with #[cfg(test)] to prevent them and the constants from being compiled into release builds; locate the const declarations WEIGHT_EXTENSIONS and IGNORED_FILES and the fn definitions is_weight_file and is_ignored and add #[cfg(test)] above each item (or group them under a #[cfg(test)] mod) so they are only available during testing.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In @.github/workflows/ci.yml:
- Around line 216-218: The release job currently lists needs = [test,
integration-test, security-audit, docker-build] but omits the sidecar-test job;
update the release job's needs array to include "sidecar-test" (i.e., needs =
[test, integration-test, security-audit, docker-build, sidecar-test]) so the
release will be gated by the Sidecar Python Tests when the sidecar is part of
the release artifact; if the sidecar is optional, instead add a conditional
dependency or gating check in the release job to only require "sidecar-test"
when the sidecar is included.
In `@docker-compose.yml`:
- Around line 51-63: The minio-init service hardcodes credentials in the mc
alias set command and uses sleep 5; update minio-init to read MINIO_ROOT_USER
and MINIO_ROOT_PASSWORD environment variables (same names used by the minio
service) when running mc alias set instead of literal "minioadmin minioadmin",
and change the depends_on to use the health condition so minio-init waits for
the minio service healthcheck rather than a fixed sleep; locate the minio-init
service and the mc alias set invocation to implement these changes and ensure
the env var names match the minio service healthcheck usage.
In `@Dockerfile`:
- Around line 39-40: The Dockerfile sets PYTHONPATH=/app/sidecar/src while also
installing the sidecar package into the virtualenv (/app/venv) and using
/app/venv/bin/python in entrypoint.sh; remove the redundant PYTHONPATH
assignment (or add a concise comment) so the container relies on the venv's
site-packages instead—update the PYTHONPATH usage near the ENV or entrypoint
lines or document why PYTHONPATH=/app/sidecar/src is required (e.g., for
unpackaged development-only modules), and ensure entrypoint.sh continues to
invoke /app/venv/bin/python and the installed package name so behavior remains
identical.
In `@entrypoint.sh`:
- Around line 42-53: The current use of exec when launching
/app/modelexpress-server replaces the shell so the cleanup() trap never runs and
the sidecar (SIDECAR_PID) can be orphaned; change to not use exec: launch
/app/modelexpress-server without exec, capture its PID (e.g., SERVER_PID), and
use the existing trap/cleanup to kill both SIDECAR_PID and SERVER_PID on
EXIT/INT/TERM, and also forward signals to the server (kill -SIGTERM
"$SERVER_PID") so the shell can wait (wait "$SERVER_PID") and still perform
cleanup; update references to the launch line, the cleanup() function, trap,
SIDECAR_PID, and the server start logic to implement this behavior.
In `@modelexpress_common/src/config.rs`:
- Around line 456-464: Remove the auto-derived Debug on GcsCredentials and add a
custom fmt::Debug impl that redacts sensitive content in the credentials_json
field (e.g., display None or Some("<redacted>") instead of the actual JSON),
while still showing non-sensitive fields like credentials_file; mirror the
approach used for S3Credentials (keep #[derive(Clone, Serialize, Deserialize,
Default)] but replace #[derive(Debug)] with a manual impl Debug for
GcsCredentials that prints credentials_file and a redacted placeholder for
credentials_json.
- Around line 421-434: S3Credentials currently derives Debug and will expose
sensitive fields; replace the derived Debug with a manual impl for the
S3Credentials struct that redacts access_key_id and secret_access_key (e.g.,
show "<redacted>" or masked value) while still showing non-sensitive fields like
region and endpoint, and remove Debug from the derive list; implement fmt::Debug
for S3Credentials so any logging (tracing::debug!, dbg!(), etc.) uses the
redacted representation.
In `@modelexpress_common/src/download.rs`:
- Around line 119-123: The test test_get_provider_model_streamer relies on
ModelStreamerProvider::from_env which reads environment variables; to isolate
it, clear or set the relevant env vars before calling
get_provider(ModelProvider::ModelStreamer) so external env won't affect the
result — e.g., remove or set vars consumed by SidecarConfig::from_env,
S3Credentials::from_env and GcsCredentials::from_env (or set them to
known-default values) and restore them after the test; alternatively wrap the
test with a helper that sets a clean env for ModelStreamerProvider::from_env to
ensure deterministic behavior.
- Around line 17-31: Change get_provider to return a Result<Box<dyn
ModelProviderTrait>, E> instead of panicking: have get_provider accept
ModelProvider and return Ok(Box::new(HuggingFaceProvider)) for
ModelProvider::HuggingFace and, for ModelProvider::ModelStreamer, call
ModelStreamerProvider::from_env() and propagate its Err (wrap or convert into
your chosen error type) instead of tracing::error! + panic!. Update all callers
(e.g., download_model which already returns Result, and stream_model_files /
list_model_files in services.rs) to use ? to propagate the error or map it to a
tonic::Status (e.g., .map_err(|e| Status::internal(...))) so initialization
failures are handled as errors rather than crashing the process.
In `@modelexpress_common/src/providers/model_streamer.rs`:
- Around line 22-30: The DownloadRequest currently derives Debug and is being
debug-logged, which will expose S3/GCS secrets; remove the automatic Debug
output and either (A) implement a custom Debug for DownloadRequest that redacts
s3_credentials and gcs_credentials (or implements Debug on
S3CredentialsPayload/GcsCredentialsPayload to print "<redacted>"), or (B) stop
logging the whole struct and change the debug call to only log non-sensitive
fields (model_path, cache_dir, ignore_weights) and omit or replace credential
fields with "<redacted>"; update the Debug implementations for
S3CredentialsPayload/GcsCredentialsPayload if you choose centralized redaction.
In `@sidecar/pyproject.toml`:
- Around line 40-49: Update the pytest-asyncio dependency floor to at least
0.24.0 so the pytest.ini option asyncio_default_fixture_loop_scope is supported;
specifically change the pytest-asyncio entry in the dependency list (the string
"pytest-asyncio>=0.21") to "pytest-asyncio>=0.24.0" in the pyproject.toml
dependencies section to avoid configuration warnings/errors.
In `@sidecar/src/modelexpress_sidecar/api/routes.py`:
- Around line 58-68: The except blocks in the get_model and delete_model
endpoints are re-raising HTTPException without chaining the original exception,
which loses traceback context; update the two places that call raise
HTTPException(...) (the handler around downloader.get_model in the get_model
route and the corresponding except in delete_model) to use "raise
HTTPException(...) from e" (i.e., re-raise the HTTPException with the original
exception variable) so the original exception chain is preserved for debugging.
In `@sidecar/src/modelexpress_sidecar/services/downloader.py`:
- Around line 257-321: The _download_with_boto3 method can leave partial files
if s3_client.download_file fails; update the loop that calls
s3_client.download_file to wrap each per-file download in try/except: call
s3_client.download_file inside a try, and on Exception (catch broad Exception)
delete the local_file if it exists (to remove partial/incomplete files), log the
failure with the key/rel_path and exception details, then re-raise the exception
so the existing caller behavior in download() remains unchanged; reference the
symbols _download_with_boto3, s3_client.download_file, local_file and download()
when making this change.
- Around line 109-146: The current approach mutates process-global os.environ
via _setup_credentials and _restore_credentials (and relies on an asyncio.Lock)
which is fragile, serializes downloads, and mishandles empty-string env values;
instead, stop touching os.environ: remove or stop calling _setup_credentials and
_restore_credentials, and construct the storage clients with explicit
credentials (for S3 pass aws_access_key_id, aws_secret_access_key, region_name
and endpoint_url to boto3.client; for GCS supply the credentials file or
credentials object to the GCS client constructor). Also remove the
credential-locking around downloads so downloads can run concurrently; if you
must keep restore code, fix the restore logic in _restore_credentials to
distinguish None from an empty-string (i.e., store presence and original value
explicitly rather than using if value) but prefer the direct-client-credentials
approach to eliminate env mutation and the lock entirely.
- Around line 96-107: The _resolve_model_id implementation allows path-traversal
because it naively concatenates user-controlled model_id parts into a Path;
change it to first reject absolute paths and any path segments containing ".."
or null bytes, then build the candidate path using cache_base / CACHE_SUBDIR /
scheme / bucket / path and call .resolve() on both cache_base/CACHE_SUBDIR and
the candidate, finally verify the resolved candidate is a subpath of the
resolved cache root (raise an exception if not). Apply the same containment
check before performing destructive operations in delete_model and also before
reading in get_model; reference the functions _resolve_model_id, delete_model,
get_model, default_cache_dir, and CACHE_SUBDIR when locating and updating the
code.
In `@sidecar/tests/test_s3_integration.py`:
- Around line 148-153: The test test_download_nonexistent_bucket currently only
checks that the "success" key exists, which is too weak; update the assertion to
validate the expected outcome (e.g., assert data["success"] is False if the
intended behavior is to report failure) or, if behavior can vary, assert that
either data["success"] is False or an "error" key is present (e.g., assert not
data.get("success") or "error" in data) and add a short comment documenting the
chosen expected S3 behavior so future readers know why this check is used.
---
Nitpick comments:
In @.github/workflows/ci.yml:
- Around line 239-254: Update the "Start MinIO for integration tests" step to
pin the MinIO and mc images instead of using minio/minio:latest and
minio/mc:latest (choose a specific release tag like RELEASE.x) and replace the
fragile sleep 5 wait with a readiness loop that polls MinIO's health endpoint
(http://localhost:9000/minio/health/live) until healthy before creating buckets
and uploading test data; keep the existing docker run and mc commands but
reference the pinned image tags and wait for the health check to succeed before
running the mc container.
In @.gitignore:
- Around line 53-58: The .gitignore currently covers .venv/, __pycache__/,
*.pyc, *.pyo, and *.egg-info/ but should also include other common Python
artifacts; update the ignore entries to add patterns such as venv/, env/, .env,
.Python, .pytest_cache/, .mypy_cache/, build/, dist/, pip-wheel-metadata/ and
*.egg to ensure virtual environments, caches, build artifacts, and wheel
metadata (the same area as the existing .venv/ and *.egg-info/ entries) are
excluded.
In `@Dockerfile`:
- Around line 43-46: The Dockerfile is copying test artifacts into the
production runtime image (COPY --from=builder /app/target/release/test_client
and /fallback_test); remove those two COPY lines or move them into a separate
multi-stage "test" stage so only production binaries (modelexpress-server and
modelexpress-cli) are included in the final runtime image; ensure the final
stage only copies /app/target/release/modelexpress-server and modelexpress-cli
and, if needed, add a dedicated test stage that copies test_client and
fallback_test for CI/test images instead of the production image.
In `@entrypoint.sh`:
- Around line 29-36: The health-check loop currently spawns a full Python
interpreter each retry by calling /app/venv/bin/python -c "import httpx;
httpx.get(...)" which is slow; replace this by either (a) using a lightweight
shell tool if available (curl or wget) to hit
http://127.0.0.1:${SIDECAR_PORT}/health inside the loop, or (b) create a small
persistent Python helper that performs the retry loop internally (so only one
interpreter starts) and call that instead; update references to RETRY_COUNT and
MAX_RETRIES to match the new approach and ensure the same success/failure
behavior and exit conditions are preserved.
In `@modelexpress_common/src/config.rs`:
- Around line 383-401: Extract the duplicated default values into constants and
use them from both the Default impl and from_env(): define e.g. const
DEFAULT_SIDECAR_ENDPOINT: &str = "http://127.0.0.1:8002"; and const
DEFAULT_SIDECAR_TIMEOUT: u64 = 300, then replace the literal in impl Default for
SidecarConfig::default() (endpoint and timeout_secs) and the fallbacks in
SidecarConfig::from_env() (the unwrap_or_else endpoint and unwrap_or timeout) to
reference DEFAULT_SIDECAR_ENDPOINT and DEFAULT_SIDECAR_TIMEOUT so the values
remain single-sourced.
In `@modelexpress_common/src/providers/model_streamer.rs`:
- Around line 149-165: The current global replace can mangle URIs containing
scheme substrings; update encode_model_id to use strip_prefix for the simple
schemes too: check model_path.strip_prefix("s3://") and if Some(rest) return
format!("s3/{}", rest) (return "s3/" if rest is empty), then check
strip_prefix("gs://") and return format!("gs/{}", rest) similarly, falling back
to the original model_path only if none of the prefixes match; keep the existing
s3+http(s) branch untouched.
- Around line 167-183: The is_weight_file and is_ignored helpers (and their
associated constants WEIGHT_EXTENSIONS and IGNORED_FILES) are only used by tests
test_is_weight_file and test_is_ignored, so annotate them with #[cfg(test)] to
prevent them and the constants from being compiled into release builds; locate
the const declarations WEIGHT_EXTENSIONS and IGNORED_FILES and the fn
definitions is_weight_file and is_ignored and add #[cfg(test)] above each item
(or group them under a #[cfg(test)] mod) so they are only available during
testing.
In `@modelexpress_server/src/database.rs`:
- Around line 108-112: The provider string/enum conversion logic is duplicated
across get_model_record, get_models_by_last_used, set_status,
try_claim_for_download and get_status; extract this into centralized conversions
by implementing std::str::FromStr for ModelProvider (to parse provider_str →
ModelProvider) and std::fmt::Display (or From<ModelProvider> for &str /
ToString) for enum→string, update those functions to use
provider_str.parse::<ModelProvider>() and provider.to_string() (or .to_string())
instead of repeating match arms, and remove the duplicated match blocks so
adding new variants only requires updating ModelProvider's impls.
In `@sidecar/src/modelexpress_sidecar/api/schemas.py`:
- Around line 11-19: The S3Credentials.schema defines region as Optional[str]
with a default "us-east-1", allowing callers to explicitly pass None which
overrides the default; decide whether None is meaningful and then make the type
match: if region must always have a value, change the type on the region field
in S3Credentials from Optional[str] to str so callers cannot pass null and the
default is enforced, otherwise keep Optional[str] but add a comment or
validation in S3Credentials (e.g., in a validator) clarifying that None is
acceptable for S3-compatible endpoints.
In `@sidecar/src/modelexpress_sidecar/services/downloader.py`:
- Around line 40-54: The is_ignored_file function currently does a
case-sensitive comparison against IGNORED_FILES while is_weight_file and
is_image_file lowercase the filename; update is_ignored_file to perform a
case-insensitive check by lowercasing the input filename (filename.lower())
before membership testing or by ensuring IGNORED_FILES is a set of lowercase
names and comparing against that; modify the is_ignored_file function (and
adjust IGNORED_FILES initialization if you choose the latter) so filename case
is normalized consistently with is_weight_file and is_image_file.
- Around line 218-255: The _download_with_model_streamer function currently
never uses the imported SafetensorsStreamer and simply calls
_download_with_boto3 in both try and except; either remove this wrapper and call
_download_with_boto3 directly from download(), leaving a TODO for phase 2, or
keep the function but make its intent explicit by, within the try branch after
importing SafetensorsStreamer, logging that the streamer was detected but not
yet implemented (e.g., "SafetensorsStreamer available but streamer-based
download not implemented; falling back to boto3") before calling
_download_with_boto3; update any callers (e.g., download()) accordingly to
reflect the chosen approach.
In `@sidecar/tests/test_s3_integration.py`:
- Around line 56-67: Extract the duplicated s3_credentials dict and the repeated
download request payload into a shared fixture or helper used by
test_download_from_minio and test_get_downloaded_model: create a pytest fixture
(e.g., s3_credentials or make_s3_payload) that returns the dict with
access_key_id, secret_access_key, region, and endpoint (pulling from os.environ
fallbacks) and optionally a function that builds the POST JSON including
model_path and cache_dir, then replace the inline dicts in
test_download_from_minio and test_get_downloaded_model with calls to that
fixture/helper to remove duplication and centralize credential setup.
a772438 to
b230fe1
Compare
|
Replying to a few threads below and resolving the ones that are fine to defer. |
- Redact secrets in Debug output for S3Credentials, GcsCredentials, and DownloadRequest so credentials are never leaked in logs - Fix entrypoint.sh: drop exec so cleanup trap can fire and kill the sidecar - Chain exceptions in routes.py (raise from err) to preserve tracebacks - Guard against path traversal in _resolve_model_id - Bump pytest-asyncio floor to >=0.24 (asyncio_default_fixture_loop_scope was introduced in 0.24) - Strengthen test_download_nonexistent_bucket assertion Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Redact secrets in Debug output for S3Credentials, GcsCredentials, and DownloadRequest so credentials are never leaked in logs - Fix entrypoint.sh: drop exec so cleanup trap can fire and kill the sidecar - Chain exceptions in routes.py (raise from err) to preserve tracebacks - Guard against path traversal in _resolve_model_id - Bump pytest-asyncio floor to >=0.24 (asyncio_default_fixture_loop_scope was introduced in 0.24) - Strengthen test_download_nonexistent_bucket assertion Signed-off-by: Nicolas Pixel Noble <pixel@nobis-crew.org>
cb50a5c to
b06121a
Compare
- Redact secrets in Debug output for S3Credentials, GcsCredentials, and DownloadRequest so credentials are never leaked in logs - Fix entrypoint.sh: drop exec so cleanup trap can fire and kill the sidecar - Chain exceptions in routes.py (raise from err) to preserve tracebacks - Guard against path traversal in _resolve_model_id - Bump pytest-asyncio floor to >=0.24 (asyncio_default_fixture_loop_scope was introduced in 0.24) - Strengthen test_download_nonexistent_bucket assertion Signed-off-by: Nicolas 'Pixel' Noble <nicolas@nobis-crew.org>
b06121a to
dd930bf
Compare
- Redact secrets in Debug output for S3Credentials, GcsCredentials, and DownloadRequest so credentials are never leaked in logs - Fix entrypoint.sh: drop exec so cleanup trap can fire and kill the sidecar - Chain exceptions in routes.py (raise from err) to preserve tracebacks - Guard against path traversal in _resolve_model_id - Bump pytest-asyncio floor to >=0.24 (asyncio_default_fixture_loop_scope was introduced in 0.24) - Strengthen test_download_nonexistent_bucket assertion Signed-off-by: Nicolas 'Pixel' Noble <nicolas@nobis-crew.org>
dd930bf to
bc33c3f
Compare
- Redact secrets in Debug output for S3Credentials, GcsCredentials, and DownloadRequest so credentials are never leaked in logs - Fix entrypoint.sh: drop exec so cleanup trap can fire and kill the sidecar - Chain exceptions in routes.py (raise from err) to preserve tracebacks - Guard against path traversal in _resolve_model_id - Bump pytest-asyncio floor to >=0.24 (asyncio_default_fixture_loop_scope was introduced in 0.24) - Strengthen test_download_nonexistent_bucket assertion Signed-off-by: Nicolas 'Pixel' Noble <nicolas@nobis-crew.org>
bc33c3f to
f022142
Compare
- Redact secrets in Debug output for S3Credentials, GcsCredentials, and DownloadRequest so credentials are never leaked in logs - Fix entrypoint.sh: drop exec so cleanup trap can fire and kill the sidecar - Chain exceptions in routes.py (raise from err) to preserve tracebacks - Guard against path traversal in _resolve_model_id - Bump pytest-asyncio floor to >=0.24 (asyncio_default_fixture_loop_scope was introduced in 0.24) - Strengthen test_download_nonexistent_bucket assertion Signed-off-by: Nicolas 'Pixel' Noble <nicolas@nobis-crew.org>
f022142 to
6bd848b
Compare
We provide a method to download from S3 using Model Streamer. Given Model Streamer's API, this ends up as a Python sidecar process for which we add a Rust interface so this becomes completely transparent for the end user, where we only have a new provider that they can use. Signed-off-by: Nicolas Pixel Noble <pixel@nobis-crew.org>
Signed-off-by: Nicolas Pixel Noble <pixel@nobis-crew.org>
…ect.toml All dependencies were already declared in pyproject.toml. Updated the Dockerfile to pip install the sidecar package directly instead of using the separate requirements.txt. Signed-off-by: Nicolas Pixel Noble <pixel@nobis-crew.org>
The Pydantic model S3Credentials defines the field as 'endpoint', but the tests were sending 'endpoint_url' which was silently ignored. Signed-off-by: Nicolas Pixel Noble <pixel@nobis-crew.org>
The Python sidecar normalizes s3+http/s3+https URIs to scheme "s3" and strips the endpoint, storing cached models under s3/bucket/path. The Rust encode_model_id was producing s3-http/endpoint:port/bucket/path instead, so get_model_path and delete_model would never find the cached files. Now both sides agree on the same path layout. Signed-off-by: Nicolas Pixel Noble <pixel@nobis-crew.org>
Added the missing associated functions so the existing tests compile. The extension/file lists match the Python sidecar definitions. Signed-off-by: Nicolas Pixel Noble <pixel@nobis-crew.org>
Concurrent requests could overwrite each other's AWS/GCS environment variables. Added a lock around the setup/download/restore cycle so only one request mutates os.environ at a time. Signed-off-by: Nicolas Pixel Noble <pixel@nobis-crew.org>
_download_with_boto3 uses synchronous boto3 calls (paginator, download_file) which would block the async event loop. Changed it to a regular sync method and wrapped the calls with asyncio.to_thread so the FastAPI server stays responsive during downloads. Signed-off-by: Nicolas Pixel Noble <pixel@nobis-crew.org>
is_weight_file was missing the case-insensitive check entirely, and is_image_file was calling .lower() on every iteration. Both now lowercase the filename once upfront. Signed-off-by: Nicolas Pixel Noble <pixel@nobis-crew.org>
delete_model was hardcoding self.default_cache_dir while get_model accepted a custom cache_dir. Now both behave consistently - the route and downloader accept an optional cache_dir, falling back to the default when not provided. Signed-off-by: Nicolas Pixel Noble <pixel@nobis-crew.org>
- Extract "model-streamer" to a CACHE_SUBDIR constant - Simplify MinIO URI parsing with str.partition - Use PurePosixPath for S3 key filename extraction - Consolidate duplicate model_id -> cache path logic into _resolve_model_id, used by both get_model and delete_model Signed-off-by: Nicolas Pixel Noble <pixel@nobis-crew.org>
The exists() check was only applied during size calculation but the unfiltered list was returned in the response. Now we filter once and use the same list for both the size sum and the response payload. Signed-off-by: Nicolas Pixel Noble <pixel@nobis-crew.org>
- Redact secrets in Debug output for S3Credentials, GcsCredentials, and DownloadRequest so credentials are never leaked in logs - Fix entrypoint.sh: drop exec so cleanup trap can fire and kill the sidecar - Chain exceptions in routes.py (raise from err) to preserve tracebacks - Guard against path traversal in _resolve_model_id - Bump pytest-asyncio floor to >=0.24 (asyncio_default_fixture_loop_scope was introduced in 0.24) - Strengthen test_download_nonexistent_bucket assertion Signed-off-by: Nicolas 'Pixel' Noble <nicolas@nobis-crew.org>
6bd848b to
3b24b94
Compare
Signed-off-by: Nicolas 'Pixel' Noble <nicolas@nobis-crew.org>
We provide a method to download from S3 using Model Streamer. Given Model Streamer's API, this ends up as a Python sidecar process for which we add a Rust interface so this becomes completely transparent for the end user, where we only have a new provider that they can use.
This also adds a small test in the form of a minio storage.
I'm not 100% certain of what I'm doing in Python here, so adding @ajcasagrande for reviewing this part too.
Summary by CodeRabbit
Release Notes
New Features
Chores