DiSpec — a from-scratch LLM inference engine: paged attention, continuous batching, CUDA-graph decode, speculative decoding, and prefill/decode disaggregation
cuda inference kv-cache llm-inference speculative-decoding paged-attention cuda-graphs disaggregated-inference prefill-decode continuous-batch
-
Updated
Jun 24, 2026 - Python