Skip to content

perf(inference): broader Candle inference deletion (deferred from #1273) — audit plasticity LoRA reachability first #1280

@joelteply

Description

@joelteply

Follow-on to #1273 (Candle Qwen3.5 deletion, PR #1279) — broader Candle inference cleanup deferred from #1273's original scope.

PR #1279 deleted the Qwen3.5-specific Candle path (Qwen35GgufBackend + vendored quantized_qwen35), but the broader Candle inference chain remains:

  • inference/candle_adapter.rs (~1500 LOC) — CandleAdapter is imported but never instantiated by AIProviderModule::register_adapters. Dead in production.
  • inference/model.rs (~900 LOC) — ContinuumModel has no callers outside the inference module.
  • inference/quantized.rs (~300 LOC) — load_quantized_model, load_default_quantized only called by candle_adapter.rs.
  • inference/backends/{generate, load_gguf_backend} — only called by the dead chain above + bin/* debug binaries.
  • inference/vendored/{quantized_llama, qwen2, compact_llama}.rs — Candle backends not used by any registered adapter.
  • inference/backends/{llama_gguf, llama_safetensors, qwen2_safetensors}.rs — used only by the dead chain or by plasticity tests.
  • bin/inference_test.rs, bin/test_qwen_gguf.rs, bin/diagnose_prefill.rs, bin/mixed_quant.rs — not exposed as Cargo binaries; orphaned source files.

Why deferred from #1273

Initial #1273 plan was a single atomic deletion. cargo check after deleting model.rs revealed backends/llama_safetensors.rs:20 uses model::rebuild_with_stacked_lora, and modules/plasticity/validation.rs:766, :802 uses compact_llama_safetensors::CompactLlamaSafetensorsBackend — these are test code (#[test] blocks), but they're plasticity tests for LoRA training infrastructure. Whether plasticity LoRA training is itself a live production path or a vestigial training surface is the question that determines the deletion blast radius.

Acceptance

  1. Audit plasticity LoRA training reachability. Is crate::modules::plasticity exercised by any live IPC handler / CLI command / chat hot path? Or is it test-only / experimental infrastructure that's never invoked in production?
  2. If plasticity is dead in production: delete the entire Candle chain — candle_adapter.rs, ContinuumModel (and model.rs if everything in it depends on it), quantized.rs, qwen2/llama vendored backends, llama_gguf/llama_safetensors/qwen2_safetensors backends, and the orphaned bin/* files. ~4000+ LOC.
  3. If plasticity is live in production: scope deletion to only the chain that's truly unreachable — likely candle_adapter.rs + quantized.rs (since plasticity uses safetensors backends directly, not via CandleAdapter). Document which Candle infrastructure stays and why.

Why this matters

Per Joel's "if not UI/UX it is rust" rule and the no-CPU-fallback alpha contract, dead Candle infrastructure that contradicts these rules on paper but never executes is the same anti-pattern this audit (#1262) was filed for. The deferred scope is the bulk of the dead-code mass.

Lane: alpha flywheel #1272 lane 6 (continues #1262 audit work).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions