Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
51 changes: 48 additions & 3 deletions docs/troubleshooting.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,10 +26,16 @@ python3 -c "import onnxruntime; print(onnxruntime.__version__, onnxruntime.get_a
pip install 'onnxruntime-gpu>=1.25.1' 'nvidia-cudnn-cu12>=9.5' 'nvidia-cublas-cu12>=12.6'
```

**Jetson fix:** On JetPack 6.0+, use the NVIDIA-provided ORT wheel from the Jetson Zoo (NOT the desktop x86 wheel — that crashes or falls back to CPU silently):
**Jetson fix:** On JetPack 6.0+, do not install the `[gpu]` extra from standard PyPI (since those wheels are `x86_64` only). Install `[serve,monolithic]` and then pull the Jetson-compatible `onnxruntime-gpu` wheel from the Jetson AI Lab index:
```bash
# JetPack 6 / R36 / Python 3.10
pip install onnxruntime-gpu --extra-index-url https://elinux.org/Jetson_Zoo
# Pin numpy<2 for Jetson Zoo ABI compatibility
pip install 'numpy<2'

# JetPack 6.0 / 6.1 (cu126)
pip install onnxruntime-gpu --extra-index-url https://pypi.jetson-ai-lab.io/jp6/cu126

# Or JetPack 6.2+ (cu129)
pip install onnxruntime-gpu --extra-index-url https://pypi.jetson-ai-lab.io/jp6/cu129
```

JetPack 5.x (R35 / CUDA 11.4) is not supported — `reflex doctor` will flag this loudly with the upgrade path.
Expand Down Expand Up @@ -147,6 +153,45 @@ dpkg -l | grep nvidia-l4t-core

JetPack 5.x is not supported. The v0.9.4 doctor guard parses `/etc/nv_tegra_release`, detects R35, and surfaces the upgrade path loudly — without it, ORT silently falls back to CPU and you get useless latency numbers.

### `A module that was compiled using NumPy 1.x cannot be run in NumPy 2.x`

```
ImportError: A module that was compiled using NumPy 1.x cannot be run in NumPy 2.2.6
```

**Cause:** The Jetson AI Lab `torch` and `onnxruntime-gpu` wheels are compiled against NumPy 1.x C ABI. If `numpy>=2.0` is installed (pip's default), both libraries crash on import.

**Fix:** Pin `numpy<2` **before** installing torch or onnxruntime-gpu:
```bash
pip install 'numpy<2'
# Then install torch / ort from the Jetson AI Lab index
```

If you already installed numpy 2.x, downgrade:
```bash
pip install 'numpy<2' --force-reinstall
```

### `No matching distribution found for lerobot==0.5.1` (Python 3.10)

```
ERROR: Could not find a version that satisfies the requirement lerobot==0.5.1; extra == "monolithic"
ERROR: Ignored the following versions that require a different python version: 0.5.0 Requires-Python >=3.12; 0.5.1 Requires-Python >=3.12
```

**Cause:** The `[monolithic]` (and `[native]`, `[rtc]`) extras depend on `lerobot==0.5.1`, which requires Python ≥ 3.12. JetPack 6 ships Python 3.10.

**Fix:** On Jetson, install `[serve]` only — **not** `[monolithic]`:
```bash
pip install 'reflex-vla[serve]'
```

The monolithic ONNX export (`reflex export --monolithic`) requires lerobot and must run on a **Python 3.12+ host** (desktop, cloud GPU, or Docker). Export there, then copy the ONNX to the Jetson and serve it:
```bash
# On Jetson — serve a pre-exported model
reflex serve /path/to/exported/model/
```

### `Thermal throttling during inference`

**Symptoms:** Latency spikes after 5–10 minutes of continuous inference.
Expand Down
116 changes: 60 additions & 56 deletions examples/02-deploy-smolvla-jetson.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,82 +6,86 @@

## Install on the Jetson

> **Two things that will break your install if you skip them:**
> 1. **Pin `numpy<2` before installing anything else.** The Jetson AI Lab torch and onnxruntime-gpu wheels are compiled against NumPy 1.x. If pip pulls NumPy 2.x, both libraries will crash on import with *"A module that was compiled using NumPy 1.x cannot be run in NumPy 2.x"*.
> 2. **Do NOT use `[gpu]` from standard PyPI.** Those wheels are `x86_64`-only and will fail with `ResolutionImpossible` on `aarch64`.

### Recommended: bootstrap installer

```bash
pip install 'reflex-vla[serve,gpu,monolithic]'
./install.sh
```

Why those extras:
- `serve` — FastAPI + uvicorn for the HTTP inference server
- `gpu` — `onnxruntime-gpu` (links to CUDA on the Jetson via the nvidia container runtime)
- `monolithic` — `lerobot` + `transformers==5.3.0` + `onnx-diagnostic`, the cos=+1.0 verified export path
### Manual install

This pulls ~2 GB of dependencies. Takes 5-10 minutes on the Jetson.
```bash
# 0. Create a clean venv (recommended)
python3 -m venv ~/reflex-orin && source ~/reflex-orin/bin/activate
pip install -U pip setuptools wheel

## One command — deploy
# 1. Pin numpy<2 FIRST — before torch or ort
pip install 'numpy<2'

```bash
reflex go --model smolvla-base
```
# 2. Install Jetson-native torch + ort from the Jetson AI Lab index
pip install torch torchvision \
--index-url https://pypi.jetson-ai-lab.io/jp6/cu126

What this does, step by step:
pip install onnxruntime-gpu \
--index-url https://pypi.jetson-ai-lab.io/jp6/cu126

# 3. Install reflex-vla with [serve] only (NOT [gpu], NOT [monolithic])
pip install 'reflex-vla[serve]'
```
device: orin_nano (via tegrastats, GPU=Jetson Orin Nano)
model: smolvla-base (lerobot/smolvla_base, 900MB, action_dim=7)
strategy: exact-id
pulling: lerobot/smolvla_base → ~/.cache/reflex/models/smolvla-base/
↓ 900 MB from HuggingFace (~30 sec)
exporting: ~/.cache/reflex/models/smolvla-base → ~/.cache/reflex/exports/smolvla-base
(target=orin-nano, monolithic, 5-15 min depending on hardware)
↓ Loading PyTorch model
↓ Tracing torch.export (the heaviest step on Orin Nano)
↓ Writing ONNX (~1.6 GB on disk)
↓ Validating cos=+1.0 vs PyTorch reference
export complete in 612.4s ONNX=model.onnx (1623 MB)

Starting serve on http://0.0.0.0:8000
Loading ONNX into onnxruntime-gpu (CUDAExecutionProvider)...
TRT engine build (first time)... ~60-90 sec
Warmup inference... ~5 sec
✓ Server ready

> **Why not `[monolithic]`?** The monolithic export extra depends on `lerobot==0.5.1`, which requires **Python ≥ 3.12**. JetPack 6 ships Python 3.10. Export your model on a desktop/cloud machine with Python 3.12+, then copy the ONNX to the Jetson and serve it.

### Adding `reflex` to your PATH (non-venv installs)
If you installed without a venv and see `reflex: command not found`, add `~/.local/bin`:
```bash
export PATH="$HOME/.local/bin:$PATH"
echo 'export PATH="$HOME/.local/bin:$PATH"' >> ~/.bashrc
```

## Hit /act
### What each piece provides:
- `numpy<2` — ABI compatibility with Jetson AI Lab pre-built wheels
- `torch` / `onnxruntime-gpu` (from Jetson AI Lab) — GPU-accelerated inference, compiled for `aarch64` + JetPack CUDA
- `reflex-vla[serve]` — FastAPI + uvicorn HTTP inference server + embodiment validation

From another terminal (or a connected workstation):
This pulls ~2 GB of dependencies. Takes 5-10 minutes on the Jetson.

## Deploy

Since monolithic export requires Python 3.12+ (for `lerobot`), the typical Jetson workflow is **export on a desktop/cloud host, serve on-device**.

### Step 1: Export on a Python 3.12+ machine

```bash
curl -X POST http://<jetson-ip>:8000/act \
-H 'content-type: application/json' \
-d '{
"instruction": "pick up the red cup",
"image": "<base64-png-or-jpeg>",
"state": [0.1, 0.2, 0.3, 0.4, 0.5, 0.6]
}'
# On your desktop / cloud GPU (Python 3.12+)
pip install 'reflex-vla[serve,monolithic]'
reflex export --model smolvla-base --out ./smolvla-export/
```

Response:

```json
{
"actions": [[...], [...], ...],
"latency_ms": 47.3,
"inference_mode": "onnx_trt_fp16",
"guard_clamped": false
}
Then copy the export directory to the Jetson:
```bash
scp -r ./smolvla-export/ aihpc@<jetson-ip>:~/smolvla-export/
```

## What just happened
### Step 2: Serve on the Jetson

`reflex go` chained:
```bash
# On the Jetson (Python 3.10, [serve] only)
reflex serve ~/smolvla-export/
```

1. **Hardware probe** — `tegrastats` confirms Orin Nano (8 GB)
2. **Model resolution** — picked `smolvla-base` from the curated registry; warned if your `--device-class` doesn't match the model's `supported_devices`
3. **Pull** — `huggingface_hub.snapshot_download`; cached in `~/.cache/reflex/models/`
4. **Export** — `reflex.exporters.monolithic.export_monolithic` traces PyTorch → ONNX with `num_steps=10` baked in, validates parity at cos=+1.0; output cached in `~/.cache/reflex/exports/`
5. **Serve** — `reflex.runtime.server.create_app` mounts the ONNX into onnxruntime-gpu, builds a TRT FP16 engine on first run (cached for next time), exposes `/act` and `/health`
What happens:
```
Loading ONNX into onnxruntime-gpu (CUDAExecutionProvider)...
TRT engine build (first time)... ~60-90 sec
Warmup inference... ~5 sec
✓ Server ready on http://0.0.0.0:8000
```

Re-running `reflex go --model smolvla-base` skips pull (cache hit) and skips export (`VERIFICATION.md` marker hit), goes straight to serve in ~2 sec.
> **If `reflex go` is available** (i.e. you have a pre-exported ONNX cached from a prior session), `reflex go --model smolvla-base` will skip export (cache hit) and go straight to serve in ~2 sec.

## Or use the chat

Expand All @@ -96,7 +100,7 @@ Watch it call `list_targets`, `pull_model`, `export_model`, `serve_model` in seq

## Troubleshooting

- **"Missing dependencies for monolithic export"** — install `[monolithic]`: `pip install 'reflex-vla[monolithic]'`
- **"Missing dependencies for monolithic export"** — export requires Python 3.12+ with `[monolithic]`; run on a desktop/cloud host
- **"CUDA unavailable"** — confirm `nvidia-container-runtime` is set up on the Jetson; `reflex doctor` will tell you which check failed
- **TRT engine build fails** — try `--no-trt` to fall back to plain CUDAExecutionProvider; usually means `trtexec` isn't on PATH
- **Disk full** — SmolVLA needs ~2 GB free for weights + ONNX. `reflex inspect targets` shows memory budgets per hardware tier.
77 changes: 50 additions & 27 deletions install.sh
Original file line number Diff line number Diff line change
Expand Up @@ -172,11 +172,20 @@ if [ -z "$EXTRAS" ]; then
if [ "$IS_JETSON" -eq 1 ] || [ "$FORCE_JETSON" -eq 1 ]; then
# NEVER install [gpu] on Jetson — those are x86_64 wheels (nvidia-cu12,
# tensorrt). They will either fail to install, segfault, or silently
# fall back to CPU. Instead we install [serve,monolithic] and then
# pull the Jetson Zoo onnxruntime-gpu wheel explicitly.
EXTRAS="serve,monolithic"
ok "Detected Jetson ($JETSON_MODEL) → installing with [serve,monolithic]"
note " Jetson-specific GPU runtime will be installed separately."
# fall back to CPU.
#
# NEVER install [monolithic] on Jetson — it depends on lerobot==0.5.1
# which requires Python >=3.12. JetPack 6 ships Python 3.10. Export
# on a Python 3.12+ host, then serve the ONNX on the Jetson.
#
# Instead we install [serve] only, and pre-install numpy<2, torch, and
# onnxruntime-gpu from the Jetson AI Lab index BEFORE reflex-vla so
# pip doesn't pull incompatible x86_64 or numpy-2.x-linked wheels.
EXTRAS="serve"
ok "Detected Jetson ($JETSON_MODEL) → installing with [serve]"
note " Jetson-specific GPU deps (numpy<2, torch, ort) will be installed first."
note " [monolithic] skipped — lerobot requires Python >=3.12 (Jetson has 3.10)."
note " Export on a desktop/cloud host, then serve the ONNX here."
elif [ "$OS" = "Darwin" ]; then
EXTRAS="serve,onnx,monolithic"
ok "Detected macOS → installing with [serve,onnx,monolithic] (CPU runtime)"
Expand All @@ -187,7 +196,7 @@ if [ -z "$EXTRAS" ]; then
EXTRAS="serve,onnx,monolithic"
ok "No GPU detected → installing with [serve,onnx,monolithic] (CPU runtime)"
fi
if [ "$EXTRAS" != "serve,monolithic" ] && [ "$IS_JETSON" -eq 0 ]; then
if [ "$IS_JETSON" -eq 0 ] && echo "$EXTRAS" | grep -q "monolithic"; then
note " (monolithic adds the extras 'reflex go' needs to actually deploy a model — not just chat)"
fi
fi
Expand All @@ -212,18 +221,14 @@ if ! "$PYTHON" -m pip --version >/dev/null 2>&1; then
fi
fi

# -- Run pip install ----------------------------------------------------------
PIP_TARGET="reflex-vla[$EXTRAS]"
info "Installing: $PIP_TARGET"
echo
"$PYTHON" -m pip install --upgrade "$PIP_TARGET"

# -- Jetson: install Jetson Zoo onnxruntime-gpu -------------------------------
# This must happen AFTER reflex is installed so we override any CPU-only
# onnxruntime that may have been pulled transitively.
# -- Jetson: pre-install GPU deps from Jetson AI Lab --------------------------
# This MUST happen BEFORE `pip install reflex-vla[serve]` so that:
# 1. numpy<2 is locked in place before torch's transitive dep pulls 2.x
# 2. torch comes from the Jetson AI Lab aarch64 wheel (not PyPI x86_64)
# 3. onnxruntime-gpu comes from Jetson AI Lab (not the unresolvable PyPI one)
if [ "$IS_JETSON" -eq 1 ] || [ "$FORCE_JETSON" -eq 1 ]; then
echo
info "Installing Jetson-compatible onnxruntime-gpu..."
info "Pre-installing Jetson GPU dependencies..."

# JetPack version → index URL mapping
# Default to JP6.0/6.1 (cu126) since that's the current standard.
Expand All @@ -244,26 +249,44 @@ if [ "$IS_JETSON" -eq 1 ] || [ "$FORCE_JETSON" -eq 1 ]; then
;;
esac

# Pin numpy<2 because Jetson Zoo wheels are compiled against numpy 1.x
# and will segfault / throw ABI errors with numpy 2.x.
# Reflex itself works fine with numpy 1.x.
note " Pinning numpy<2 for Jetson Zoo ABI compatibility..."
# 1. Pin numpy<2 FIRST — Jetson AI Lab torch and ort wheels are compiled
# against numpy 1.x C ABI. numpy 2.x causes:
# "A module that was compiled using NumPy 1.x cannot be run in NumPy 2.x"
note " Step 1/3: Pinning numpy<2 for Jetson AI Lab ABI compatibility..."
"$PYTHON" -m pip install 'numpy<2' || warn "numpy pin failed — may cause runtime issues"

# Install the GPU wheel from Jetson Zoo index
if "$PYTHON" -m pip install --upgrade --index-url "$JETSON_INDEX" onnxruntime-gpu; then
ok "Installed onnxruntime-gpu (Jetson Zoo, $JETSON_INDEX)"
# 2. Install Jetson-native torch (aarch64, CUDA-enabled)
note " Step 2/3: Installing torch from Jetson AI Lab index..."
if "$PYTHON" -m pip install --index-url "$JETSON_INDEX" torch torchvision; then
ok "Installed torch (Jetson AI Lab, $JETSON_INDEX)"
else
warn "Jetson AI Lab torch install failed — pip will fall back to PyPI torch."
note " PyPI torch may be CPU-only on aarch64."
fi

# 3. Install Jetson-native onnxruntime-gpu
note " Step 3/3: Installing onnxruntime-gpu from Jetson AI Lab index..."
if "$PYTHON" -m pip install --index-url "$JETSON_INDEX" onnxruntime-gpu; then
ok "Installed onnxruntime-gpu (Jetson AI Lab, $JETSON_INDEX)"
else
echo
fail "Jetson Zoo onnxruntime-gpu install failed."
fail "Jetson AI Lab onnxruntime-gpu install failed."
info "Manual install command:"
note " $PYTHON -m pip install numpy '<2'"
note " $PYTHON -m pip install --upgrade --index-url $JETSON_INDEX onnxruntime-gpu"
note " $PYTHON -m pip install 'numpy<2'"
note " $PYTHON -m pip install --index-url $JETSON_INDEX torch torchvision"
note " $PYTHON -m pip install --index-url $JETSON_INDEX onnxruntime-gpu"
echo
warn "Reflex is installed but inference will fall back to CPU (slow)."
warn "Reflex will be installed but inference will fall back to CPU (slow)."
fi
echo
fi

# -- Run pip install ----------------------------------------------------------
PIP_TARGET="reflex-vla[$EXTRAS]"
info "Installing: $PIP_TARGET"
echo
"$PYTHON" -m pip install --upgrade "$PIP_TARGET"

echo
ok "Installed."
echo
Expand Down
18 changes: 15 additions & 3 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -111,17 +111,27 @@ safety = ["yourdfpy"]
# reference PyTorch policy. Required for the native-path-parity regression
# gate in GOALS.yaml. Heavy (pulls torch, torchvision, accelerate, etc.) —
# that's why it's a separate extra, not a base dep.
#
# lerobot 0.5.x requires Python >=3.12. Jetson (JetPack 6) ships 3.10,
# so we gate lerobot behind a version marker — pip skips it on 3.10/3.11
# instead of hard-failing with ResolutionImpossible. Native parity
# verification should be run on a Python 3.12+ host.
native = [
"lerobot==0.5.1",
"lerobot==0.5.1; python_version >= '3.12'",
"num2words",
]
# Monolithic ONNX export — cos=1.0 verified production path. Runs locally
# via `reflex export --monolithic` (the DEFAULT as of v0.2.1). Requires
# transformers==5.3.0 EXACTLY (5.4+ has a q_length regression in
# masking_utils.sdpa_mask that breaks onnx-diagnostic patches). Base pin
# `transformers>=4.40,<5.4` permits this extra to pin-to-exact.
#
# lerobot 0.5.x requires Python >=3.12. On Jetson (Python 3.10) the
# monolithic export can't run locally — export on a cloud/desktop host
# (Python 3.12+) and `reflex serve` the resulting ONNX on the Jetson
# with `pip install 'reflex-vla[serve]'` only.
monolithic = [
"lerobot==0.5.1",
"lerobot==0.5.1; python_version >= '3.12'",
"transformers==5.3.0",
"onnx-diagnostic>=0.9",
"onnxscript>=0.1",
Expand Down Expand Up @@ -162,8 +172,10 @@ tracing = [
# Lerobot is heavy (torch + vision + datasets); only install when serving
# RTC-enabled deployments. Skeleton + config validation work without this
# extra; only the actual processor construction needs it.
#
# lerobot 0.5.x requires Python >=3.12 (skipped on 3.10/3.11).
rtc = [
"lerobot==0.5.1",
"lerobot==0.5.1; python_version >= '3.12'",
]
# Curate format-converter extras. Phase 1 ships LeRobot v3 + HDF5 with the
# core install (parquet via pyarrow which huggingface_hub already pulls; h5py
Expand Down
Loading