SparseServe

Paper: SparseServe: Unlocking Parallelism for Dynamic Sparse Attention in Long-Context LLM Serving

Thank you for your interest in our SparseServe work! Please star our repository, and stay tuned – we will be releasing the code here soon.

Citation

@misc{zhou2025sparseserveunlockingparallelismdynamic,
      title={SparseServe: Unlocking Parallelism for Dynamic Sparse Attention in Long-Context LLM Serving}, 
      author={Qihui Zhou and Peiqi Yin and Pengfei Zuo and James Cheng},
      year={2025},
      eprint={2509.24626},
      archivePrefix={arXiv},
      primaryClass={cs.DC},
      url={https://arxiv.org/abs/2509.24626}, 
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SparseServe

Citation

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

SparseServe

Citation