You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
After PRs #270 (Slots wire-up) + #271 (Models wire-up) + #274 (unload-noop fix) landed, a systematic CRUD sweep on the hal0 LXC against the live dashboard surfaced 7 stacking bugs beyond the unload-404 case that #274 already fixed.
Bugs (independent, file as separate sub-issues or fix in one cleanup PR)
POST /api/slots schema mismatch. Body accepts top-level `{"model": "Qwen3-Embedding-0.6B-GGUF"}` (per audit + Lemonade-shape SlotConfig) but writes top-level `model = "..."` to TOML. Serializer in `slots.py:191-195` reads `cfg["model"]["default"]` (nested dict). Result: `model_default` missing from /api/slots response for any slot created via POST. Workaround: hand-write TOML in v0.1 nested shape (`[model] default = "..."`).
POST /api/slots doesn't auto-assign port. New slots get `port=0` in state.json. Per `config.next_free_port()`, should auto-assign in 8081-8099. The POST handler at `slots.py:327` calls `sm.create(name, body)` without port-injection if body lacks one.
`hal0 slot create` CLI uses v0.1 schema. No `--type` flag (embedding / reranking / transcription / tts not creatable). `--hardware` enum is `[vulkan|rocm|cpu]` (legacy backend), not the Lemonade `[gpu-vulkan|gpu-rocm|cpu|npu]` device enum. CLI needs Lemonade-shape update.
Installer doesn't pre-create `/var/lib/hal0/.cache/huggingface/hub`. Lemonade (running as `hal0:hal0` user) needs this dir writable for /v1/pull. /var/lib/hal0 itself is owned root:root with no group-write. First pull fails with "filesystem error: cannot create directories: Permission denied".
Dispatcher /v1/chat/completions doesn't route to Lemonade-loaded models. Models pulled via `POST /v1/pull` to Lemonade aren't in hal0's model registry. Dispatcher's `/v1/chat/completions` route fires first (before PR feat(api): /v1/* reverse-proxy to lemonade #248's /v1/* proxy catch-all) and returns `dispatch.no_route` — proxy fallthrough never happens. Either: (a) dispatcher consults Lemonade /v1/health.loaded[] before failing, or (b) on no_route, dispatcher delegates to the Lemonade proxy instead of 404'ing, or (c) Lemonade-managed routes are removed from the dispatcher in Lemonade mode (cleanest).
Slot watcher flips state to ERROR when Lemonade evicts. Primary slot loaded successfully → watcher saw Lemonade loaded[] empty seconds later → set state=error "model not loaded in lemond". The eviction itself is plausible (memory `hal0_lemonade_gotchas` notes nuclear evict-all on load failure) but the slot state machine should track "model_default_assigned" separately from "model_currently_loaded_in_lemonade" and show OFFLINE + a non-error "not loaded" message instead of ERROR.
Lemonade evicts loaded models spontaneously. Need to confirm whether this is implicit idle-TTL, nuclear-evict-on-other-load, or something else. Memory `hal0_lemonade_gotchas` says no idle TTL by default. May be a different trigger — needs trace.
Triage
Bug 5 (dispatcher routing) is the biggest user-visible gap: chat doesn't work end-to-end through hal0-api despite Lemonade serving fine direct on :13305.
Bug 1 (schema mismatch) is the biggest dev-experience gap: anyone creating slots via the API gets broken card data.
Bugs 2, 3 are CLI / installer polish.
Bugs 4, 6, 7 are operational.
Discovered
Direct manual testing of every CRUD path on the live LXC after merging the wire-up PRs. Per memory `feedback_test_ui_on_real_hardware`.
Background
After PRs #270 (Slots wire-up) + #271 (Models wire-up) + #274 (unload-noop fix) landed, a systematic CRUD sweep on the hal0 LXC against the live dashboard surfaced 7 stacking bugs beyond the unload-404 case that #274 already fixed.
Bugs (independent, file as separate sub-issues or fix in one cleanup PR)
POST /api/slots schema mismatch. Body accepts top-level `{"model": "Qwen3-Embedding-0.6B-GGUF"}` (per audit + Lemonade-shape SlotConfig) but writes top-level `model = "..."` to TOML. Serializer in `slots.py:191-195` reads `cfg["model"]["default"]` (nested dict). Result: `model_default` missing from /api/slots response for any slot created via POST. Workaround: hand-write TOML in v0.1 nested shape (`[model] default = "..."`).
POST /api/slots doesn't auto-assign port. New slots get `port=0` in state.json. Per `config.next_free_port()`, should auto-assign in 8081-8099. The POST handler at `slots.py:327` calls `sm.create(name, body)` without port-injection if body lacks one.
`hal0 slot create` CLI uses v0.1 schema. No `--type` flag (embedding / reranking / transcription / tts not creatable). `--hardware` enum is `[vulkan|rocm|cpu]` (legacy backend), not the Lemonade `[gpu-vulkan|gpu-rocm|cpu|npu]` device enum. CLI needs Lemonade-shape update.
Installer doesn't pre-create `/var/lib/hal0/.cache/huggingface/hub`. Lemonade (running as `hal0:hal0` user) needs this dir writable for /v1/pull. /var/lib/hal0 itself is owned root:root with no group-write. First pull fails with "filesystem error: cannot create directories: Permission denied".
Dispatcher /v1/chat/completions doesn't route to Lemonade-loaded models. Models pulled via `POST /v1/pull` to Lemonade aren't in hal0's model registry. Dispatcher's `/v1/chat/completions` route fires first (before PR feat(api): /v1/* reverse-proxy to lemonade #248's /v1/* proxy catch-all) and returns `dispatch.no_route` — proxy fallthrough never happens. Either: (a) dispatcher consults Lemonade /v1/health.loaded[] before failing, or (b) on no_route, dispatcher delegates to the Lemonade proxy instead of 404'ing, or (c) Lemonade-managed routes are removed from the dispatcher in Lemonade mode (cleanest).
Slot watcher flips state to ERROR when Lemonade evicts. Primary slot loaded successfully → watcher saw Lemonade loaded[] empty seconds later → set state=error "model not loaded in lemond". The eviction itself is plausible (memory `hal0_lemonade_gotchas` notes nuclear evict-all on load failure) but the slot state machine should track "model_default_assigned" separately from "model_currently_loaded_in_lemonade" and show OFFLINE + a non-error "not loaded" message instead of ERROR.
Lemonade evicts loaded models spontaneously. Need to confirm whether this is implicit idle-TTL, nuclear-evict-on-other-load, or something else. Memory `hal0_lemonade_gotchas` says no idle TTL by default. May be a different trigger — needs trace.
Triage
Discovered
Direct manual testing of every CRUD path on the live LXC after merging the wire-up PRs. Per memory `feedback_test_ui_on_real_hardware`.