FastCrest · rylinjames · May 17, 2026 · May 17, 2026 · May 17, 2026
@@ -26,10 +26,16 @@ python3 -c "import onnxruntime; print(onnxruntime.__version__, onnxruntime.get_a
 pip install 'onnxruntime-gpu>=1.25.1' 'nvidia-cudnn-cu12>=9.5' 'nvidia-cublas-cu12>=12.6'
 ```
 
-**Jetson fix:** On JetPack 6.0+, use the NVIDIA-provided ORT wheel from the Jetson Zoo (NOT the desktop x86 wheel — that crashes or falls back to CPU silently):
+**Jetson fix:** On JetPack 6.0+, do not install the `[gpu]` extra from standard PyPI (since those wheels are `x86_64` only). Install `[serve,monolithic]` and then pull the Jetson-compatible `onnxruntime-gpu` wheel from the Jetson AI Lab index:
 ```bash
-# JetPack 6 / R36 / Python 3.10
-pip install onnxruntime-gpu --extra-index-url https://elinux.org/Jetson_Zoo
+# Pin numpy<2 for Jetson Zoo ABI compatibility
+pip install 'numpy<2'
+
+# JetPack 6.0 / 6.1 (cu126)
+pip install onnxruntime-gpu --extra-index-url https://pypi.jetson-ai-lab.io/jp6/cu126
+
+# Or JetPack 6.2+ (cu129)
+pip install onnxruntime-gpu --extra-index-url https://pypi.jetson-ai-lab.io/jp6/cu129
 ```
 
 JetPack 5.x (R35 / CUDA 11.4) is not supported — `reflex doctor` will flag this loudly with the upgrade path.
@@ -147,6 +153,45 @@ dpkg -l | grep nvidia-l4t-core
 
 JetPack 5.x is not supported. The v0.9.4 doctor guard parses `/etc/nv_tegra_release`, detects R35, and surfaces the upgrade path loudly — without it, ORT silently falls back to CPU and you get useless latency numbers.
 
+### `A module that was compiled using NumPy 1.x cannot be run in NumPy 2.x`
+
+```
+ImportError: A module that was compiled using NumPy 1.x cannot be run in NumPy 2.2.6
+```
+
+**Cause:** The Jetson AI Lab `torch` and `onnxruntime-gpu` wheels are compiled against NumPy 1.x C ABI. If `numpy>=2.0` is installed (pip's default), both libraries crash on import.
+
+**Fix:** Pin `numpy<2` **before** installing torch or onnxruntime-gpu:
+```bash
+pip install 'numpy<2'
+# Then install torch / ort from the Jetson AI Lab index
+```
+
+If you already installed numpy 2.x, downgrade:
+```bash
+pip install 'numpy<2' --force-reinstall
+```
+
+### `No matching distribution found for lerobot==0.5.1` (Python 3.10)
+
+```
+ERROR: Could not find a version that satisfies the requirement lerobot==0.5.1; extra == "monolithic"
+ERROR: Ignored the following versions that require a different python version: 0.5.0 Requires-Python >=3.12; 0.5.1 Requires-Python >=3.12
+```
+
+**Cause:** The `[monolithic]` (and `[native]`, `[rtc]`) extras depend on `lerobot==0.5.1`, which requires Python ≥ 3.12. JetPack 6 ships Python 3.10.
+
+**Fix:** On Jetson, install `[serve]` only — **not** `[monolithic]`:
+```bash
+pip install 'reflex-vla[serve]'
+```
+
+The monolithic ONNX export (`reflex export --monolithic`) requires lerobot and must run on a **Python 3.12+ host** (desktop, cloud GPU, or Docker). Export there, then copy the ONNX to the Jetson and serve it:
+```bash
+# On Jetson — serve a pre-exported model
+reflex serve /path/to/exported/model/
+```
+
 ### `Thermal throttling during inference`
 
 **Symptoms:** Latency spikes after 5–10 minutes of continuous inference.

@@ -6,82 +6,86 @@
 
 ## Install on the Jetson
 
+> **Two things that will break your install if you skip them:**
+> 1. **Pin `numpy<2` before installing anything else.** The Jetson AI Lab torch and onnxruntime-gpu wheels are compiled against NumPy 1.x. If pip pulls NumPy 2.x, both libraries will crash on import with *"A module that was compiled using NumPy 1.x cannot be run in NumPy 2.x"*.
+> 2. **Do NOT use `[gpu]` from standard PyPI.** Those wheels are `x86_64`-only and will fail with `ResolutionImpossible` on `aarch64`.
+
+### Recommended: bootstrap installer
+
 ```bash
-pip install 'reflex-vla[serve,gpu,monolithic]'
+./install.sh
 ```
 
-Why those extras:
-- `serve` — FastAPI + uvicorn for the HTTP inference server
-- `gpu` — `onnxruntime-gpu` (links to CUDA on the Jetson via the nvidia container runtime)
-- `monolithic` — `lerobot` + `transformers==5.3.0` + `onnx-diagnostic`, the cos=+1.0 verified export path
+### Manual install
 
-This pulls ~2 GB of dependencies. Takes 5-10 minutes on the Jetson.
+```bash
+# 0. Create a clean venv (recommended)
+python3 -m venv ~/reflex-orin && source ~/reflex-orin/bin/activate
+pip install -U pip setuptools wheel
 
-## One command — deploy
+# 1. Pin numpy<2 FIRST — before torch or ort
+pip install 'numpy<2'
 
-```bash
-reflex go --model smolvla-base
-```
+# 2. Install Jetson-native torch + ort from the Jetson AI Lab index
+pip install torch torchvision \
+  --index-url https://pypi.jetson-ai-lab.io/jp6/cu126
 
-What this does, step by step:
+pip install onnxruntime-gpu \
+  --index-url https://pypi.jetson-ai-lab.io/jp6/cu126
 
+# 3. Install reflex-vla with [serve] only (NOT [gpu], NOT [monolithic])
+pip install 'reflex-vla[serve]'
 ```
-device:    orin_nano (via tegrastats, GPU=Jetson Orin Nano)
-model:     smolvla-base (lerobot/smolvla_base, 900MB, action_dim=7)
-  strategy: exact-id
-pulling:   lerobot/smolvla_base → ~/.cache/reflex/models/smolvla-base/
-           ↓ 900 MB from HuggingFace (~30 sec)
-exporting: ~/.cache/reflex/models/smolvla-base → ~/.cache/reflex/exports/smolvla-base
-           (target=orin-nano, monolithic, 5-15 min depending on hardware)
-           ↓ Loading PyTorch model
-           ↓ Tracing torch.export (the heaviest step on Orin Nano)
-           ↓ Writing ONNX (~1.6 GB on disk)
-           ↓ Validating cos=+1.0 vs PyTorch reference
-export complete in 612.4s  ONNX=model.onnx (1623 MB)
-
-Starting serve on http://0.0.0.0:8000
-  Loading ONNX into onnxruntime-gpu (CUDAExecutionProvider)...
-  TRT engine build (first time)...   ~60-90 sec
-  Warmup inference...                ~5 sec
-  ✓ Server ready
+
+> **Why not `[monolithic]`?** The monolithic export extra depends on `lerobot==0.5.1`, which requires **Python ≥ 3.12**. JetPack 6 ships Python 3.10. Export your model on a desktop/cloud machine with Python 3.12+, then copy the ONNX to the Jetson and serve it.
+
+### Adding `reflex` to your PATH (non-venv installs)
+If you installed without a venv and see `reflex: command not found`, add `~/.local/bin`:
+```bash
+export PATH="$HOME/.local/bin:$PATH"
+echo 'export PATH="$HOME/.local/bin:$PATH"' >> ~/.bashrc
 ```
 
-## Hit /act
+### What each piece provides:
+- `numpy<2` — ABI compatibility with Jetson AI Lab pre-built wheels
+- `torch` / `onnxruntime-gpu` (from Jetson AI Lab) — GPU-accelerated inference, compiled for `aarch64` + JetPack CUDA
+- `reflex-vla[serve]` — FastAPI + uvicorn HTTP inference server + embodiment validation
 
-From another terminal (or a connected workstation):
+This pulls ~2 GB of dependencies. Takes 5-10 minutes on the Jetson.
+
+## Deploy
+
+Since monolithic export requires Python 3.12+ (for `lerobot`), the typical Jetson workflow is **export on a desktop/cloud host, serve on-device**.
+
+### Step 1: Export on a Python 3.12+ machine
 
 ```bash
-curl -X POST http://<jetson-ip>:8000/act \
-  -H 'content-type: application/json' \
-  -d '{
-    "instruction": "pick up the red cup",
-    "image": "<base64-png-or-jpeg>",
-    "state": [0.1, 0.2, 0.3, 0.4, 0.5, 0.6]
-  }'
+# On your desktop / cloud GPU (Python 3.12+)
+pip install 'reflex-vla[serve,monolithic]'
+reflex export --model smolvla-base --out ./smolvla-export/
 ```
 
-Response:
-
-```json
-{
-  "actions": [[...], [...], ...],
-  "latency_ms": 47.3,
-  "inference_mode": "onnx_trt_fp16",
-  "guard_clamped": false
-}
+Then copy the export directory to the Jetson:
+```bash
+scp -r ./smolvla-export/ aihpc@<jetson-ip>:~/smolvla-export/
 ```
 
-## What just happened
+### Step 2: Serve on the Jetson
 
-`reflex go` chained:
+```bash
+# On the Jetson (Python 3.10, [serve] only)
+reflex serve ~/smolvla-export/
+```
 
-1. **Hardware probe** — `tegrastats` confirms Orin Nano (8 GB)
-2. **Model resolution** — picked `smolvla-base` from the curated registry; warned if your `--device-class` doesn't match the model's `supported_devices`
-3. **Pull** — `huggingface_hub.snapshot_download`; cached in `~/.cache/reflex/models/`
-4. **Export** — `reflex.exporters.monolithic.export_monolithic` traces PyTorch → ONNX with `num_steps=10` baked in, validates parity at cos=+1.0; output cached in `~/.cache/reflex/exports/`
-5. **Serve** — `reflex.runtime.server.create_app` mounts the ONNX into onnxruntime-gpu, builds a TRT FP16 engine on first run (cached for next time), exposes `/act` and `/health`
+What happens:
+```
+Loading ONNX into onnxruntime-gpu (CUDAExecutionProvider)...
+TRT engine build (first time)...   ~60-90 sec
+Warmup inference...                ~5 sec
+✓ Server ready on http://0.0.0.0:8000
+```
 
-Re-running `reflex go --model smolvla-base` skips pull (cache hit) and skips export (`VERIFICATION.md` marker hit), goes straight to serve in ~2 sec.
+> **If `reflex go` is available** (i.e. you have a pre-exported ONNX cached from a prior session), `reflex go --model smolvla-base` will skip export (cache hit) and go straight to serve in ~2 sec.
 
 ## Or use the chat
 
@@ -96,7 +100,7 @@ Watch it call `list_targets`, `pull_model`, `export_model`, `serve_model` in seq
 
 ## Troubleshooting
 
-- **"Missing dependencies for monolithic export"** — install `[monolithic]`: `pip install 'reflex-vla[monolithic]'`
+- **"Missing dependencies for monolithic export"** — export requires Python 3.12+ with `[monolithic]`; run on a desktop/cloud host
 - **"CUDA unavailable"** — confirm `nvidia-container-runtime` is set up on the Jetson; `reflex doctor` will tell you which check failed
 - **TRT engine build fails** — try `--no-trt` to fall back to plain CUDAExecutionProvider; usually means `trtexec` isn't on PATH
 - **Disk full** — SmolVLA needs ~2 GB free for weights + ONNX. `reflex inspect targets` shows memory budgets per hardware tier.
@@ -172,11 +172,20 @@ if [ -z "$EXTRAS" ]; then
   if [ "$IS_JETSON" -eq 1 ] || [ "$FORCE_JETSON" -eq 1 ]; then
     # NEVER install [gpu] on Jetson — those are x86_64 wheels (nvidia-cu12,
     # tensorrt). They will either fail to install, segfault, or silently
-    # fall back to CPU. Instead we install [serve,monolithic] and then
-    # pull the Jetson Zoo onnxruntime-gpu wheel explicitly.
-    EXTRAS="serve,monolithic"
-    ok "Detected Jetson ($JETSON_MODEL) → installing with [serve,monolithic]"
-    note "  Jetson-specific GPU runtime will be installed separately."
+    # fall back to CPU.
+    #
+    # NEVER install [monolithic] on Jetson — it depends on lerobot==0.5.1
+    # which requires Python >=3.12. JetPack 6 ships Python 3.10. Export
+    # on a Python 3.12+ host, then serve the ONNX on the Jetson.
+    #
+    # Instead we install [serve] only, and pre-install numpy<2, torch, and
+    # onnxruntime-gpu from the Jetson AI Lab index BEFORE reflex-vla so
+    # pip doesn't pull incompatible x86_64 or numpy-2.x-linked wheels.
+    EXTRAS="serve"
+    ok "Detected Jetson ($JETSON_MODEL) → installing with [serve]"
+    note "  Jetson-specific GPU deps (numpy<2, torch, ort) will be installed first."
+    note "  [monolithic] skipped — lerobot requires Python >=3.12 (Jetson has 3.10)."
+    note "  Export on a desktop/cloud host, then serve the ONNX here."
   elif [ "$OS" = "Darwin" ]; then
     EXTRAS="serve,onnx,monolithic"
     ok "Detected macOS → installing with [serve,onnx,monolithic] (CPU runtime)"
@@ -187,7 +196,7 @@ if [ -z "$EXTRAS" ]; then
     EXTRAS="serve,onnx,monolithic"
     ok "No GPU detected → installing with [serve,onnx,monolithic] (CPU runtime)"
   fi
-  if [ "$EXTRAS" != "serve,monolithic" ] && [ "$IS_JETSON" -eq 0 ]; then
+  if [ "$IS_JETSON" -eq 0 ] && echo "$EXTRAS" | grep -q "monolithic"; then
     note "  (monolithic adds the extras 'reflex go' needs to actually deploy a model — not just chat)"
   fi
 fi
@@ -212,18 +221,14 @@ if ! "$PYTHON" -m pip --version >/dev/null 2>&1; then
   fi
 fi
 
-# -- Run pip install ----------------------------------------------------------
-PIP_TARGET="reflex-vla[$EXTRAS]"
-info "Installing: $PIP_TARGET"
-echo
-"$PYTHON" -m pip install --upgrade "$PIP_TARGET"
-
-# -- Jetson: install Jetson Zoo onnxruntime-gpu -------------------------------
-# This must happen AFTER reflex is installed so we override any CPU-only
-# onnxruntime that may have been pulled transitively.
+# -- Jetson: pre-install GPU deps from Jetson AI Lab --------------------------
+# This MUST happen BEFORE `pip install reflex-vla[serve]` so that:
+#   1. numpy<2 is locked in place before torch's transitive dep pulls 2.x
+#   2. torch comes from the Jetson AI Lab aarch64 wheel (not PyPI x86_64)
+#   3. onnxruntime-gpu comes from Jetson AI Lab (not the unresolvable PyPI one)
 if [ "$IS_JETSON" -eq 1 ] || [ "$FORCE_JETSON" -eq 1 ]; then
   echo
-  info "Installing Jetson-compatible onnxruntime-gpu..."
+  info "Pre-installing Jetson GPU dependencies..."
 
   # JetPack version → index URL mapping
   # Default to JP6.0/6.1 (cu126) since that's the current standard.
@@ -244,26 +249,44 @@ if [ "$IS_JETSON" -eq 1 ] || [ "$FORCE_JETSON" -eq 1 ]; then
       ;;
   esac
 
-  # Pin numpy<2 because Jetson Zoo wheels are compiled against numpy 1.x
-  # and will segfault / throw ABI errors with numpy 2.x.
-  # Reflex itself works fine with numpy 1.x.
-  note "  Pinning numpy<2 for Jetson Zoo ABI compatibility..."
+  # 1. Pin numpy<2 FIRST — Jetson AI Lab torch and ort wheels are compiled
+  #    against numpy 1.x C ABI. numpy 2.x causes:
+  #      "A module that was compiled using NumPy 1.x cannot be run in NumPy 2.x"
+  note "  Step 1/3: Pinning numpy<2 for Jetson AI Lab ABI compatibility..."
   "$PYTHON" -m pip install 'numpy<2' || warn "numpy pin failed — may cause runtime issues"
 
-  # Install the GPU wheel from Jetson Zoo index
-  if "$PYTHON" -m pip install --upgrade --index-url "$JETSON_INDEX" onnxruntime-gpu; then
-    ok "Installed onnxruntime-gpu (Jetson Zoo, $JETSON_INDEX)"
+  # 2. Install Jetson-native torch (aarch64, CUDA-enabled)
+  note "  Step 2/3: Installing torch from Jetson AI Lab index..."
+  if "$PYTHON" -m pip install --index-url "$JETSON_INDEX" torch torchvision; then
+    ok "Installed torch (Jetson AI Lab, $JETSON_INDEX)"
+  else
+    warn "Jetson AI Lab torch install failed — pip will fall back to PyPI torch."
+    note "  PyPI torch may be CPU-only on aarch64."
+  fi
+
+  # 3. Install Jetson-native onnxruntime-gpu
+  note "  Step 3/3: Installing onnxruntime-gpu from Jetson AI Lab index..."
+  if "$PYTHON" -m pip install --index-url "$JETSON_INDEX" onnxruntime-gpu; then
+    ok "Installed onnxruntime-gpu (Jetson AI Lab, $JETSON_INDEX)"
   else
     echo
-    fail "Jetson Zoo onnxruntime-gpu install failed."
+    fail "Jetson AI Lab onnxruntime-gpu install failed."
     info "Manual install command:"
-    note "  $PYTHON -m pip install numpy '<2'"
-    note "  $PYTHON -m pip install --upgrade --index-url $JETSON_INDEX onnxruntime-gpu"
+    note "  $PYTHON -m pip install 'numpy<2'"
+    note "  $PYTHON -m pip install --index-url $JETSON_INDEX torch torchvision"
+    note "  $PYTHON -m pip install --index-url $JETSON_INDEX onnxruntime-gpu"
     echo
-    warn "Reflex is installed but inference will fall back to CPU (slow)."
+    warn "Reflex will be installed but inference will fall back to CPU (slow)."
   fi
+  echo
 fi
 
+# -- Run pip install ----------------------------------------------------------
+PIP_TARGET="reflex-vla[$EXTRAS]"
+info "Installing: $PIP_TARGET"
+echo
+"$PYTHON" -m pip install --upgrade "$PIP_TARGET"
+
 echo
 ok "Installed."
 echo

@@ -111,17 +111,27 @@ safety = ["yourdfpy"]
 # reference PyTorch policy. Required for the native-path-parity regression
 # gate in GOALS.yaml. Heavy (pulls torch, torchvision, accelerate, etc.) —
 # that's why it's a separate extra, not a base dep.
+#
+# lerobot 0.5.x requires Python >=3.12. Jetson (JetPack 6) ships 3.10,
+# so we gate lerobot behind a version marker — pip skips it on 3.10/3.11
+# instead of hard-failing with ResolutionImpossible. Native parity
+# verification should be run on a Python 3.12+ host.
 native = [
-    "lerobot==0.5.1",
+    "lerobot==0.5.1; python_version >= '3.12'",
     "num2words",
 ]
 # Monolithic ONNX export — cos=1.0 verified production path. Runs locally
 # via `reflex export --monolithic` (the DEFAULT as of v0.2.1). Requires
 # transformers==5.3.0 EXACTLY (5.4+ has a q_length regression in
 # masking_utils.sdpa_mask that breaks onnx-diagnostic patches). Base pin
 # `transformers>=4.40,<5.4` permits this extra to pin-to-exact.
+#
+# lerobot 0.5.x requires Python >=3.12. On Jetson (Python 3.10) the
+# monolithic export can't run locally — export on a cloud/desktop host
+# (Python 3.12+) and `reflex serve` the resulting ONNX on the Jetson
+# with `pip install 'reflex-vla[serve]'` only.
 monolithic = [
-    "lerobot==0.5.1",
+    "lerobot==0.5.1; python_version >= '3.12'",
     "transformers==5.3.0",
     "onnx-diagnostic>=0.9",
     "onnxscript>=0.1",
@@ -162,8 +172,10 @@ tracing = [
 # Lerobot is heavy (torch + vision + datasets); only install when serving
 # RTC-enabled deployments. Skeleton + config validation work without this
 # extra; only the actual processor construction needs it.
+#
+# lerobot 0.5.x requires Python >=3.12 (skipped on 3.10/3.11).
 rtc = [
-    "lerobot==0.5.1",
+    "lerobot==0.5.1; python_version >= '3.12'",
 ]
 # Curate format-converter extras. Phase 1 ships LeRobot v3 + HDF5 with the
 # core install (parquet via pyarrow which huggingface_hub already pulls; h5py