Skip to content

feat: replace mx-source/mx-target with unified mx loader#147

Open
nicolasnoble wants to merge 7 commits intomainfrom
nnoble/mx-auto
Open

feat: replace mx-source/mx-target with unified mx loader#147
nicolasnoble wants to merge 7 commits intomainfrom
nnoble/mx-auto

Conversation

@nicolasnoble
Copy link
Contributor

@nicolasnoble nicolasnoble commented Feb 24, 2026

Replace the two separate loaders (mx-source, mx-target) with a single
MxModelLoader registered as --load-format mx. It auto-detects whether to
load model weights from disk or receive them via RDMA from an existing
source, eliminating the need to manually label nodes.

  • Add MxModelLoader with one-shot _detect_source() check against MX
    server: if ready source exists, receive via RDMA; otherwise load from
    disk. Both paths register with NIXL and publish metadata so future
    nodes can discover this one

  • Remove MxSourceModelLoader and MxTargetModelLoader classes

  • Extract shared helpers (_collect_cuda_tensors, _init_nixl_manager,
    _log_tensor_summary, _publish_metadata_and_ready) from duplicated code

  • Delete vllm-target.yaml, rename vllm-source.yaml to vllm.yaml with
    --load-format mx

  • Add unit tests for shared helpers and detection logic

  • Update documentation (ARCHITECTURE.md, DEPLOYMENT.md, CONTRIBUTING.md,
    K8s README)

    PR built on top of docs: overhaul AI instructions and extract reference documentation #146 to avoid documentation drift.

Summary by CodeRabbit

Release Notes

  • Documentation

    • Added Architecture and Deployment reference guides for comprehensive system overview
    • Reorganized Contributing guidelines with development setup, commands, and Git workflow
  • New Features

    • Unified vLLM model loader interface for streamlined P2P weight transfer configuration
    • Enhanced security auditing with dependency vulnerability scanning
  • Chores

    • Updated Kubernetes deployment examples and configuration naming

Consolidated the duplicated project context across CLAUDE.md, Copilot instructions,
and Cursor rules into concise pointers to new standalone reference docs. Extracted
architecture details into docs/ARCHITECTURE.md and deployment/configuration info
into docs/DEPLOYMENT.md. Updated CONTRIBUTING.md, README.md, and example READMEs
to match the new structure.

Signed-off-by: Nicolas 'Pixel' Noble <nicolas@nobis-crew.org>
- Fix copy-paste artifact in CONTRIBUTING.md ("The Dynamo project" -> "The ModelExpress project")
- Add ARCHITECTURE.md and DEPLOYMENT.md to the docs/ directory tree in ARCHITECTURE.md
- Add missing min_free_space_bytes field to the ServerConfig YAML example in ARCHITECTURE.md to match DEPLOYMENT.md and the actual config struct
- Add REDIS_URL to the P2P environment variables table in DEPLOYMENT.md, consistent with CONTRIBUTING.md and the K8s examples

Signed-off-by: Nicolas 'Pixel' Noble <nicolas@nobis-crew.org>
The "Enable P2P Transfers via NIXL" PR (#135) introduced significant new
functionality without updating all documentation files per the project's
documentation rules. This commit closes the remaining gaps:

ARCHITECTURE.md:
- Fix __init__.py description (no longer claims auto-registration on import)
- Add --worker-cls usage to vllm_worker.py description
- Add check_session_changed() and close() to MxClient methods table
- Fix NixlTransferManager: correct __init__ params, add get_registered_descriptors(), fix destroy() -> shutdown(), update receive_from_source signature
- Correct MxTargetModelLoader base class to DummyModelLoader
- Document transfer-time coalescing of contiguous regions
- Document _raw_tensor_registry and _nixl_managers globals
- Add MODEL_NAME and MX_SERVER_ADDRESS environment variables

DEPLOYMENT.md:
- Add MODEL_NAME and MX_SERVER_ADDRESS to P2P env var table
- Add --worker-cls usage note for vLLM instances

CONTRIBUTING.md:
- Add Python 3.10+ to prerequisites
- Add pip install -e for Python client dev setup
- Add pytest and generate_proto.sh to Available Commands table

Also includes pre-commit auto-fixes: trailing whitespace cleanup and
missing final newlines in several files.

Signed-off-by: Nicolas 'Pixel' Noble <nicolas@nobis-crew.org>
- "not complete" -> "incomplete" in CLAUDE.md, copilot-instructions.md, and rust.mdc
- "Read files" -> "Always read files" in copilot-instructions.md and rust.mdc to match CLAUDE.md
- Vary repeated "For" sentence openers in README.md

Signed-off-by: Nicolas 'Pixel' Noble <nicolas@nobis-crew.org>
Replace the two separate loaders (mx-source, mx-target) with a single
MxModelLoader registered as --load-format mx. It auto-detects whether to
load model weights from disk or receive them via RDMA from an existing
source, eliminating the need to manually label nodes.

- Add MxModelLoader with one-shot _detect_source() check against MX
  server: if ready source exists, receive via RDMA; otherwise load from
  disk. Both paths register with NIXL and publish metadata so future
  nodes can discover this one
- Remove MxSourceModelLoader and MxTargetModelLoader classes
- Extract shared helpers (_collect_cuda_tensors, _init_nixl_manager,
  _log_tensor_summary, _publish_metadata_and_ready) from duplicated code
- Delete vllm-target.yaml, rename vllm-source.yaml to vllm.yaml with
  --load-format mx
- Add unit tests for shared helpers and detection logic
- Update documentation (ARCHITECTURE.md, DEPLOYMENT.md, CONTRIBUTING.md,
  K8s README)

Signed-off-by: Nicolas 'Pixel' Noble <nicolas@nobis-crew.org>
@coderabbitai
Copy link

coderabbitai bot commented Feb 24, 2026

Walkthrough

This PR reorganizes project documentation to emphasize development practices and deployment guides, consolidates the Python vLLM loader implementation from a dual-source/target approach into a unified auto-detecting loader, and simplifies Kubernetes deployment patterns and naming conventions.

Changes

Cohort / File(s) Summary
Documentation Restructuring
.cursor/rules/rust.mdc, .github/copilot-instructions.md, CLAUDE.md, CONTRIBUTING.md, README.md, SECURITY.md, CODE_OF_CONDUCT.md
Replaced architectural overviews and project narrative with structured development guidelines, prerequisites, build/test workflows, pre-commit hooks, git workflow, and documentation hygiene requirements. Shift toward external documentation references and concrete operational practices.
New Architecture & Deployment Docs
docs/ARCHITECTURE.md, docs/DEPLOYMENT.md
Added comprehensive reference documentation: ARCHITECTURE.md covers project structure, component breakdown, gRPC services, server/client APIs, NIXL integration, FP8 handling, and configuration; DEPLOYMENT.md provides layered configuration guide, Docker/Kubernetes deployment instructions, P2P GPU transfer setup, and debugging reference.
Python vLLM Loader Refactoring
modelexpress_client/python/modelexpress/vllm_loader.py, modelexpress_client/python/modelexpress/vllm_worker.py, modelexpress_client/python/modelexpress/__init__.py
Consolidated dual-loader approach (mx-source, mx-target) into single unified MxModelLoader with auto-detection logic; added shared helper functions for tensor collection, NIXL initialization, metadata publishing; restructured loader registration to use single @register_model_loader("mx") entry point.
vLLM Loader Tests & Infrastructure
modelexpress_client/python/tests/conftest.py, modelexpress_client/python/tests/test_vllm_loader.py, modelexpress_client/python/tests/__init__.py, modelexpress_client/python/modelexpress/types.py
Added comprehensive test suite for vLLM loader with mock vLLM dependencies; introduced conftest with BaseModelLoader ABC and mock infrastructure; minor formatting cleanup.
Kubernetes Deployment Updates
examples/p2p_transfer_k8s/deploy/vllm.yaml, examples/p2p_transfer_k8s/deploy/vllm-target.yaml, examples/p2p_transfer_k8s/deploy/persistence/vllm-target-redis.yaml, examples/p2p_transfer_k8s/README.md, examples/p2p_transfer_k8s/model-download.yaml
Renamed mx-source deployment to mx-vllm; changed load-format from mx-source to mx; replaced static sleep wait with dynamic health-endpoint polling for readiness; removed separate vllm-target.yaml; updated documentation and deployment references.
CI & Configuration Updates
.dockerignore, .github/dco.yml, .github/workflows/ci.yml, helm/README.md, helm/deploy.sh
Minor updates: added trailing newlines, integrated cargo-deny step into security-audit workflow, added deployment guide link to Helm README, whitespace refinements in deploy script.
Protobuf & Minor Edits
modelexpress_common/proto/p2p.proto
Formatting-only changes to protobuf field spacing; no semantic alterations to service or message definitions.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 Hops of joy through docs so clear,
Loaders unified, no source/target fear!
Health checks poll where silence dwelt,
Dev practices felt, deployment knelt,
One loader to bind them all, my dear! 🎉

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 52.83% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and accurately summarizes the main change: replacing two separate MX loaders (mx-source/mx-target) with a unified mx loader, which is the primary feature of this PR.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

Tip

Try Coding Plans. Let us write the prompt for your AI agent so you can ship faster (with fewer bugs).
Share your feedback on Discord.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@nv-hwoo nv-hwoo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left few small comments but looks good overall 👍
Have you also tried running this on the clusters?

end
A -- "RDMA via NIXL" --> B
A --> S
B --> S
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the arrow be oppposite?

Suggested change
B --> S
B <-- S

target_device = torch.device(load_device)
logger.debug(f"Target device: {target_device}")
device_id = _get_worker_rank(target_device)
model_name = os.environ.get("MODEL_NAME", "")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can (and probably should?) get model name from model_config

Comment on lines +340 to +342
if not model_name:
logger.debug(f"[Worker {device_id}] MODEL_NAME not set, defaulting to source")
return None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Model name is always provided through model_config

Comment on lines +455 to +457
transfer_retries = 120
transfer_retry_delay = 30
cached_session_id = None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we pull these constants out instead of leaving them as local vars?

def _receive_from_peer(
self,
model: nn.Module,
device: torch.device,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unused parameter

model_config: ModelConfig,
target_device: torch.device,
device_id: int,
model_name: str,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need to pass this param if we can get it from model_config?

Comment on lines +409 to +411
logger.info(f"[Worker {device_id}] Processing weights (FP8 transformation)...")
process_weights_after_loading(model, model_config, target_device)
logger.info(f"[Worker {device_id}] Weight processing complete")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit suggestion:

Suggested change
logger.info(f"[Worker {device_id}] Processing weights (FP8 transformation)...")
process_weights_after_loading(model, model_config, target_device)
logger.info(f"[Worker {device_id}] Weight processing complete")
process_weights_after_loading(model, model_config, target_device)

"""Load weights from disk, then register + publish."""
logger.info(f"[Worker {device_id}] Loading weights from disk...")
self._disk_loader.load_weights(model, model_config)
logger.info(f"[Worker {device_id}] Weights loaded from disk")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit suggestion:

Suggested change
logger.info(f"[Worker {device_id}] Weights loaded from disk")

Comment on lines +558 to +560
logger.info(f"[Worker {device_id}] Processing weights (FP8 transformation)...")
process_weights_after_loading(model, model_config, target_device)
logger.info(f"[Worker {device_id}] Weight processing complete")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit suggestion:

Suggested change
logger.info(f"[Worker {device_id}] Processing weights (FP8 transformation)...")
process_weights_after_loading(model, model_config, target_device)
logger.info(f"[Worker {device_id}] Weight processing complete")
process_weights_after_loading(model, model_config, target_device)

Copy link
Contributor

@AndyDai-nv AndyDai-nv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Several questions. If I start a source node with TP8, now I start a second node with exactly the same model but TP4, how will this be resolved? Or maybe source node is TP4, second node is PP4? Is the metadata containing enough info for such cases? Will it detect that it can not fetch from source?

self._receive_from_peer(model, target_device, device_id, model_name, source_worker)

# Register with NIXL + publish so future nodes can discover us
self._register_and_publish(model, target_device, device_id, model_name)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this scenario, we already have a source having the same info so this target can load? Is it necessary to register and publish again?

# Conflicts:
#	examples/p2p_transfer_k8s/deploy/vllm-source.yaml
#	examples/p2p_transfer_k8s/deploy/vllm-target.yaml
#	examples/p2p_transfer_k8s/vllm-source.yaml
#	examples/p2p_transfer_k8s/vllm.yaml
#	modelexpress_client/python/modelexpress/vllm_loader.py
MxModelLoader was missing download_model() and load_weights() from the
BaseModelLoader ABC, causing a TypeError crash at instantiation on k8s.
Both methods delegate to the internal disk loader (DefaultModelLoader).

- Add conftest.py to mock vLLM modules for local test execution
- Add regression tests for abstract method completeness and delegation

Signed-off-by: Nicolas 'Pixel' Noble <nicolas@nobis-crew.org>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
examples/p2p_transfer_k8s/deploy/persistence/vllm-target-redis.yaml (2)

121-128: ⚠️ Potential issue | 🔴 Critical

Switch mx-target to unified mx to prevent boot failure.

Line 127 still uses --load-format mx-target. With unified loader registration, this can fail at startup due to an unsupported load format.

🔧 Proposed fix
-              # Start vLLM with mx-target loader
-              # mx-target: Creates dummy weights, receives RAW tensors via RDMA,
-              # THEN runs FP8 processing (identical to source)
-              # This ensures weight_scale_inv is transferred BEFORE being processed
+              # Start vLLM with unified mx loader (auto-detect source vs disk)
               python3 -m vllm.entrypoints.openai.api_server \
                 --model ${MODEL_NAME} \
-                --load-format mx-target \
+                --load-format mx \
                 --tensor-parallel-size 8 \
                 --enable-expert-parallel &
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/p2p_transfer_k8s/deploy/persistence/vllm-target-redis.yaml` around
lines 121 - 128, The startup uses the vLLM OpenAI API server entrypoint (python3
-m vllm.entrypoints.openai.api_server) with the flag --load-format set to the
deprecated/unsupported "mx-target"; change that flag value to the unified loader
name "mx" so the server uses the registered unified loader (replace
--load-format mx-target with --load-format mx in the command that starts the
server).

132-143: ⚠️ Potential issue | 🟠 Major

Add timeout and process-liveness checks to readiness wait loop.

Lines 137-143 loop forever on health failures. If the vLLM process exits early (or never becomes healthy), the pod can hang indefinitely instead of failing fast.

🔧 Proposed fix
               echo "Waiting for vLLM to be ready..."
-              python3 -c '
-              import time
-              import urllib.request
-              while True:
-                  try:
-                      urllib.request.urlopen("http://localhost:8000/health", timeout=3)
-                      break
-                  except Exception:
-                      time.sleep(3)
-              '
+              python3 - "$VLLM_PID" <<'PY'
+              import os
+              import sys
+              import time
+              import urllib.request
+
+              pid = int(sys.argv[1])
+              deadline = time.time() + 900  # 15 min max wait
+
+              while True:
+                  if time.time() > deadline:
+                      raise SystemExit("Timed out waiting for vLLM health endpoint")
+                  try:
+                      urllib.request.urlopen("http://localhost:8000/health", timeout=3)
+                      break
+                  except Exception:
+                      try:
+                          os.kill(pid, 0)  # still alive?
+                      except OSError:
+                          raise SystemExit("vLLM process exited before becoming healthy")
+                      time.sleep(3)
+              PY
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/p2p_transfer_k8s/deploy/persistence/vllm-target-redis.yaml` around
lines 132 - 143, The current python3 -c health-wait loop that polls
"http://localhost:8000/health" in a while True can hang forever; update that
block to enforce a total timeout (e.g., configurable seconds) and to perform a
process-liveness check so it fails fast if vLLM exits (instead of looping
indefinitely). Concretely, modify the python3 -c script in the readiness block
(the while True loop that calls urllib.request.urlopen) to track elapsed time
and raise SystemExit/non-zero when the timeout is exceeded, and also verify the
target process (e.g., PID 1 or the vLLM process) is still running between
attempts—exit non-zero if the process is gone. Ensure the script prints a clear
error and returns non-zero so the pod fails readiness if health never becomes OK
or the process dies.
🧹 Nitpick comments (3)
modelexpress_client/python/modelexpress/vllm_loader.py (2)

225-229: Use logging.exception for cleaner stack trace logging.

The static analysis correctly identifies that when logging an exception in an except block, logging.exception() automatically includes the traceback without needing manual traceback.format_exc().

♻️ Suggested fix
     except Exception as e:
-        import traceback
-        logger.error(f"[Worker {device_id}] EXCEPTION publishing metadata: {e}")
-        logger.error(f"[Worker {device_id}] Traceback: {traceback.format_exc()}")
+        logger.exception(f"[Worker {device_id}] EXCEPTION publishing metadata: {e}")
         raise
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelexpress_client/python/modelexpress/vllm_loader.py` around lines 225 -
229, Replace the two logger.error calls inside the exception handler that
currently log e and traceback.format_exc() with a single logger.exception call
so the stack trace is automatically included; specifically, in the except
Exception as e block (the handler referencing device_id in
modelexpress.vllm_loader.py) call logger.exception with a clear message like
"[Worker {device_id}] EXCEPTION publishing metadata" to capture the exception
and traceback cleanly.

458-463: Silent exception swallowing loses diagnostic information.

The try/except/pass block silently ignores errors when capturing the session ID. Consider logging at debug level to aid troubleshooting.

♻️ Suggested fix
         try:
             ready_resp = self._mx_client.get_ready(model_name, device_id)
             if ready_resp.found:
                 cached_session_id = ready_resp.session_id
-        except Exception:
-            pass
+        except Exception as e:
+            logger.debug(f"[Worker {device_id}] Could not capture session ID: {e}")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelexpress_client/python/modelexpress/vllm_loader.py` around lines 458 -
463, The try/except around self._mx_client.get_ready is swallowing exceptions
and losing diagnostics; update the except to capture the exception (except
Exception as e) and log it at debug level using the module/class logger (e.g.,
self._logger.debug or the existing logger) including context like model_name,
device_id and the exception, while preserving the behavior of not rethrowing so
cached_session_id is only set when ready_resp.found.
modelexpress_client/python/tests/test_vllm_loader.py (1)

279-280: Simplify the assertion for publish_ready call.

The dual-check (call_kwargs[1]["model_name"] or call_kwargs.kwargs.get("model_name")) is fragile and depends on how the mock captures arguments. Consider using a cleaner approach.

♻️ Suggested fix
-        call_kwargs = mx_client.publish_ready.call_args
-        assert call_kwargs[1]["model_name"] == "test-model" or call_kwargs.kwargs.get("model_name") == "test-model"
+        mx_client.publish_ready.assert_called_once()
+        _, kwargs = mx_client.publish_ready.call_args
+        assert kwargs["model_name"] == "test-model"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelexpress_client/python/tests/test_vllm_loader.py` around lines 279 - 280,
Replace the fragile dual-access assertion with a single, clear check using the
mock's call_args tuple unpacking or assert_called_with: capture the kwargs via
"_, kwargs = mx_client.publish_ready.call_args" (or directly use
"mx_client.publish_ready.assert_called_with(model_name='test-model')" ) and
assert "kwargs['model_name'] == 'test-model'"; this targets the publish_ready
mock reliably and removes the brittle indexing approach.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@examples/p2p_transfer_k8s/model-download.yaml`:
- Line 10: Update the usage/deploy instructions to include a Redis flush safety
step before redeploying vLLM: add a note immediately after the "Deploy
deploy/vllm.yaml" guidance (the comment line "#   4. Deploy deploy/vllm.yaml")
instructing operators to run redis-cli FLUSHALL to clear Redis metadata before
deleting/redeploying vLLM instances; ensure the note clearly states this is
required to avoid stale metadata during redeploy flows.

---

Outside diff comments:
In `@examples/p2p_transfer_k8s/deploy/persistence/vllm-target-redis.yaml`:
- Around line 121-128: The startup uses the vLLM OpenAI API server entrypoint
(python3 -m vllm.entrypoints.openai.api_server) with the flag --load-format set
to the deprecated/unsupported "mx-target"; change that flag value to the unified
loader name "mx" so the server uses the registered unified loader (replace
--load-format mx-target with --load-format mx in the command that starts the
server).
- Around line 132-143: The current python3 -c health-wait loop that polls
"http://localhost:8000/health" in a while True can hang forever; update that
block to enforce a total timeout (e.g., configurable seconds) and to perform a
process-liveness check so it fails fast if vLLM exits (instead of looping
indefinitely). Concretely, modify the python3 -c script in the readiness block
(the while True loop that calls urllib.request.urlopen) to track elapsed time
and raise SystemExit/non-zero when the timeout is exceeded, and also verify the
target process (e.g., PID 1 or the vLLM process) is still running between
attempts—exit non-zero if the process is gone. Ensure the script prints a clear
error and returns non-zero so the pod fails readiness if health never becomes OK
or the process dies.

---

Nitpick comments:
In `@modelexpress_client/python/modelexpress/vllm_loader.py`:
- Around line 225-229: Replace the two logger.error calls inside the exception
handler that currently log e and traceback.format_exc() with a single
logger.exception call so the stack trace is automatically included;
specifically, in the except Exception as e block (the handler referencing
device_id in modelexpress.vllm_loader.py) call logger.exception with a clear
message like "[Worker {device_id}] EXCEPTION publishing metadata" to capture the
exception and traceback cleanly.
- Around line 458-463: The try/except around self._mx_client.get_ready is
swallowing exceptions and losing diagnostics; update the except to capture the
exception (except Exception as e) and log it at debug level using the
module/class logger (e.g., self._logger.debug or the existing logger) including
context like model_name, device_id and the exception, while preserving the
behavior of not rethrowing so cached_session_id is only set when
ready_resp.found.

In `@modelexpress_client/python/tests/test_vllm_loader.py`:
- Around line 279-280: Replace the fragile dual-access assertion with a single,
clear check using the mock's call_args tuple unpacking or assert_called_with:
capture the kwargs via "_, kwargs = mx_client.publish_ready.call_args" (or
directly use
"mx_client.publish_ready.assert_called_with(model_name='test-model')" ) and
assert "kwargs['model_name'] == 'test-model'"; this targets the publish_ready
mock reliably and removes the brittle indexing approach.

ℹ️ Review info

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3376ee5 and 6067f33.

⛔ Files ignored due to path filters (1)
  • Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (27)
  • .cursor/rules/rust.mdc
  • .dockerignore
  • .github/copilot-instructions.md
  • .github/dco.yml
  • .github/workflows/ci.yml
  • CLAUDE.md
  • CODE_OF_CONDUCT.md
  • CONTRIBUTING.md
  • README.md
  • SECURITY.md
  • docs/ARCHITECTURE.md
  • docs/DEPLOYMENT.md
  • examples/p2p_transfer_k8s/README.md
  • examples/p2p_transfer_k8s/deploy/persistence/vllm-target-redis.yaml
  • examples/p2p_transfer_k8s/deploy/vllm-target.yaml
  • examples/p2p_transfer_k8s/deploy/vllm.yaml
  • examples/p2p_transfer_k8s/model-download.yaml
  • helm/README.md
  • helm/deploy.sh
  • modelexpress_client/python/modelexpress/__init__.py
  • modelexpress_client/python/modelexpress/types.py
  • modelexpress_client/python/modelexpress/vllm_loader.py
  • modelexpress_client/python/modelexpress/vllm_worker.py
  • modelexpress_client/python/tests/__init__.py
  • modelexpress_client/python/tests/conftest.py
  • modelexpress_client/python/tests/test_vllm_loader.py
  • modelexpress_common/proto/p2p.proto
💤 Files with no reviewable changes (3)
  • modelexpress_client/python/tests/init.py
  • examples/p2p_transfer_k8s/deploy/vllm-target.yaml
  • modelexpress_client/python/modelexpress/types.py

# 2. kubectl apply -f model-download.yaml
# 3. Wait for job to complete: kubectl wait --for=condition=complete job/model-download --timeout=1h
# 4. Deploy vllm-source.yaml and vllm-target.yaml
# 4. Deploy deploy/vllm.yaml
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Add the Redis flush redeploy step to usage instructions.

Line 10 updates deploy guidance, but the redeploy safety step is still missing. Please add a Redis flush note before redeploy flows to avoid stale metadata.

Suggested doc patch
-#   4. Deploy deploy/vllm.yaml
+#   4. If redeploying vLLM, flush Redis first:
+#      kubectl exec -it <redis-pod> -- redis-cli FLUSHALL
+#   5. Deploy deploy/vllm.yaml

As per coding guidelines "When deploying to Kubernetes, flush Redis with redis-cli FLUSHALL before deleting and redeploying vLLM instances to clear stale metadata".

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# 4. Deploy deploy/vllm.yaml
# 4. If redeploying vLLM, flush Redis first:
# kubectl exec -it <redis-pod> -- redis-cli FLUSHALL
# 5. Deploy deploy/vllm.yaml
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/p2p_transfer_k8s/model-download.yaml` at line 10, Update the
usage/deploy instructions to include a Redis flush safety step before
redeploying vLLM: add a note immediately after the "Deploy deploy/vllm.yaml"
guidance (the comment line "#   4. Deploy deploy/vllm.yaml") instructing
operators to run redis-cli FLUSHALL to clear Redis metadata before
deleting/redeploying vLLM instances; ensure the note clearly states this is
required to avoid stale metadata during redeploy flows.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants