Skip to content

Latest commit

 

History

History
19 lines (15 loc) · 704 Bytes

File metadata and controls

19 lines (15 loc) · 704 Bytes

SparseServe

Paper: SparseServe: Unlocking Parallelism for Dynamic Sparse Attention in Long-Context LLM Serving

Thank you for your interest in our SparseServe work! Please star our repository, and stay tuned – we will be releasing the code here soon.

Citation

@misc{zhou2025sparseserveunlockingparallelismdynamic,
      title={SparseServe: Unlocking Parallelism for Dynamic Sparse Attention in Long-Context LLM Serving}, 
      author={Qihui Zhou and Peiqi Yin and Pengfei Zuo and James Cheng},
      year={2025},
      eprint={2509.24626},
      archivePrefix={arXiv},
      primaryClass={cs.DC},
      url={https://arxiv.org/abs/2509.24626}, 
}