Skip to content

chore(O): remove SIMD/NEON backend and BackendType::Metal#21

Merged
dexwritescode merged 5 commits intomainfrom
chore-remove-simd-neon-backend
May 6, 2026
Merged

chore(O): remove SIMD/NEON backend and BackendType::Metal#21
dexwritescode merged 5 commits intomainfrom
chore-remove-simd-neon-backend

Conversation

@dexwritescode
Copy link
Copy Markdown
Owner

Summary

  • Delete cpu_buffer.h, compute_backend.mm (abandoned Obj-C++ draft), and the simd/ directory
  • Remove simd_graph() and metal_graph() convenience helpers from graph.h
  • Drop BackendType::Metal from the enum — updates compute_backend.cpp, neurons_service.cpp, and the non-MLX mock in test_model_loader.cpp

Apple Silicon → MLX only. Linux/Windows → CUDA/ROCm (future phases). No CPU SIMD fallback or bare Metal backend will exist for LLM-scale inference.

Delete cpu_buffer.h, compute_backend.mm (abandoned Obj-C++ draft), and
the simd/ directory. Remove the misleadingly-named simd_graph() helper.

Apple Silicon → MLX only. Linux/Windows → CUDA/ROCm (future phases).
No CPU SIMD fallback path will ever be needed for LLM-scale inference.
Delete cpu_buffer.h, compute_backend.mm (abandoned Obj-C++ draft), and
the simd/ directory. Remove simd_graph() and metal_graph() helpers, and
drop BackendType::Metal from the enum.

Apple Silicon → MLX only. Linux/Windows → CUDA/ROCm (future phases).
No CPU SIMD or bare Metal backend will be added for LLM-scale inference.
@dexwritescode dexwritescode added the release:skip Skips release creation on merge label May 6, 2026
…backs

Phase O cleanup: the Tensor/BackendBuffer and ComputeGraph/ComputeGraphBuilder
abstractions were bypassed entirely on Apple Silicon (all three model families
used mlx_weights_ directly). Removing them closes ~7 100 lines of dead code and
leaves ComputeBackend as a thin lifecycle handle only.

Removed:
- core/tensor.{h,cpp}, core/graph.{h,cpp}
- backends/mlx/mlx_buffer.h, mlx_utils.h
- model/kv_cache.h
- model/gemma_model{,_base}.{h,cpp}
- model/qwen3_moe_model{,_base}.{h,cpp}
- tests/compute/test_symbolic_api.cpp, test_mlx_backend.cpp

Simplified:
- ComputeBackend: 5 lifecycle methods only (type/name/is_available/initialize/cleanup)
- MlxBackend: implements those 5 methods; ~730 lines of Tensor ops deleted
- LlamaModel, GemmaModelMLX, Qwen3MoeModelMLX: removed inheritance from base
  Tensor-path classes; MLX classes own config_ and tokenizer_ directly
- ModelLoader: load_model()/load_all_safetensors() removed; load_model_mlx() kept
- language_model.cpp: Gemma/Qwen3MoE dispatch is now MLX-only
- BackendType::Metal removed (vestigial, never instantiated)
- Tests updated to remove calls to deleted APIs (forward(), attention_layer(),
  wrap_native_tensor(), load_model(backend))
…ethods

ComputeBackend is now a pure lifecycle abstraction: type(), name(), is_available(),
initialize(), cleanup(). All ~40 Tensor-based math methods (matmul, dequantize,
rope, softmax, sdpa, etc.) are removed from the interface and MlxBackend.

GemmaModelMLX and Qwen3MoeModelMLX no longer inherit from their Tensor-based base
classes; config_ and tokenizer_ are owned directly. ModelLoader no longer exposes
load_model() or load_all_safetensors().
- ErrorCode: remove InvalidArgument, InsufficientMemory, TensorNotFound,
  NotImplemented — none were ever returned in production code
- ComputeBackend: remove preferred_batch_size() and supports_async() — declared
  and overridden in MlxBackend but never called by any client
- ModelConfig: remove name_or_path and transformers_version — parsed from JSON
  but never read after parsing
- LlamaModel: remove context_size_ member — set in mlx_setup(), never read
- Qwen3MoeModelMLX: remove context_size_ member — same pattern
- Delete tinyllama_inference.h/.cpp — Phase D compatibility alias no longer
  needed; update 5 test files to use LlamaModel directly
- Delete test_attention_qkv_trace.cpp — became an empty placeholder after
  attention_layer() was removed in Phase O
@dexwritescode dexwritescode merged commit 7ff379e into main May 6, 2026
3 checks passed
@dexwritescode dexwritescode deleted the chore-remove-simd-neon-backend branch May 6, 2026 23:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

release:skip Skips release creation on merge

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant