Follow-on to #1273 (Candle Qwen3.5 deletion, PR #1279) — broader Candle inference cleanup deferred from #1273's original scope.
PR #1279 deleted the Qwen3.5-specific Candle path (Qwen35GgufBackend + vendored quantized_qwen35), but the broader Candle inference chain remains:
inference/candle_adapter.rs (~1500 LOC) — CandleAdapter is imported but never instantiated by AIProviderModule::register_adapters. Dead in production.
inference/model.rs (~900 LOC) — ContinuumModel has no callers outside the inference module.
inference/quantized.rs (~300 LOC) — load_quantized_model, load_default_quantized only called by candle_adapter.rs.
inference/backends/{generate, load_gguf_backend} — only called by the dead chain above + bin/* debug binaries.
inference/vendored/{quantized_llama, qwen2, compact_llama}.rs — Candle backends not used by any registered adapter.
inference/backends/{llama_gguf, llama_safetensors, qwen2_safetensors}.rs — used only by the dead chain or by plasticity tests.
bin/inference_test.rs, bin/test_qwen_gguf.rs, bin/diagnose_prefill.rs, bin/mixed_quant.rs — not exposed as Cargo binaries; orphaned source files.
Why deferred from #1273
Initial #1273 plan was a single atomic deletion. cargo check after deleting model.rs revealed backends/llama_safetensors.rs:20 uses model::rebuild_with_stacked_lora, and modules/plasticity/validation.rs:766, :802 uses compact_llama_safetensors::CompactLlamaSafetensorsBackend — these are test code (#[test] blocks), but they're plasticity tests for LoRA training infrastructure. Whether plasticity LoRA training is itself a live production path or a vestigial training surface is the question that determines the deletion blast radius.
Acceptance
- Audit plasticity LoRA training reachability. Is
crate::modules::plasticity exercised by any live IPC handler / CLI command / chat hot path? Or is it test-only / experimental infrastructure that's never invoked in production?
- If plasticity is dead in production: delete the entire Candle chain —
candle_adapter.rs, ContinuumModel (and model.rs if everything in it depends on it), quantized.rs, qwen2/llama vendored backends, llama_gguf/llama_safetensors/qwen2_safetensors backends, and the orphaned bin/* files. ~4000+ LOC.
- If plasticity is live in production: scope deletion to only the chain that's truly unreachable — likely
candle_adapter.rs + quantized.rs (since plasticity uses safetensors backends directly, not via CandleAdapter). Document which Candle infrastructure stays and why.
Why this matters
Per Joel's "if not UI/UX it is rust" rule and the no-CPU-fallback alpha contract, dead Candle infrastructure that contradicts these rules on paper but never executes is the same anti-pattern this audit (#1262) was filed for. The deferred scope is the bulk of the dead-code mass.
Lane: alpha flywheel #1272 lane 6 (continues #1262 audit work).
Follow-on to #1273 (Candle Qwen3.5 deletion, PR #1279) — broader Candle inference cleanup deferred from #1273's original scope.
PR #1279 deleted the Qwen3.5-specific Candle path (Qwen35GgufBackend + vendored quantized_qwen35), but the broader Candle inference chain remains:
inference/candle_adapter.rs(~1500 LOC) —CandleAdapteris imported but never instantiated byAIProviderModule::register_adapters. Dead in production.inference/model.rs(~900 LOC) —ContinuumModelhas no callers outside the inference module.inference/quantized.rs(~300 LOC) —load_quantized_model,load_default_quantizedonly called bycandle_adapter.rs.inference/backends/{generate, load_gguf_backend}— only called by the dead chain above +bin/*debug binaries.inference/vendored/{quantized_llama, qwen2, compact_llama}.rs— Candle backends not used by any registered adapter.inference/backends/{llama_gguf, llama_safetensors, qwen2_safetensors}.rs— used only by the dead chain or by plasticity tests.bin/inference_test.rs,bin/test_qwen_gguf.rs,bin/diagnose_prefill.rs,bin/mixed_quant.rs— not exposed as Cargo binaries; orphaned source files.Why deferred from #1273
Initial #1273 plan was a single atomic deletion.
cargo checkafter deletingmodel.rsrevealedbackends/llama_safetensors.rs:20usesmodel::rebuild_with_stacked_lora, andmodules/plasticity/validation.rs:766, :802usescompact_llama_safetensors::CompactLlamaSafetensorsBackend— these are test code (#[test]blocks), but they're plasticity tests for LoRA training infrastructure. Whether plasticity LoRA training is itself a live production path or a vestigial training surface is the question that determines the deletion blast radius.Acceptance
crate::modules::plasticityexercised by any live IPC handler / CLI command / chat hot path? Or is it test-only / experimental infrastructure that's never invoked in production?candle_adapter.rs,ContinuumModel(andmodel.rsif everything in it depends on it),quantized.rs,qwen2/llamavendored backends,llama_gguf/llama_safetensors/qwen2_safetensorsbackends, and the orphanedbin/*files. ~4000+ LOC.candle_adapter.rs+quantized.rs(since plasticity uses safetensors backends directly, not via CandleAdapter). Document which Candle infrastructure stays and why.Why this matters
Per Joel's "if not UI/UX it is rust" rule and the no-CPU-fallback alpha contract, dead Candle infrastructure that contradicts these rules on paper but never executes is the same anti-pattern this audit (#1262) was filed for. The deferred scope is the bulk of the dead-code mass.
Lane: alpha flywheel #1272 lane 6 (continues #1262 audit work).