Skip to content

Commit 14035a2

Browse files
committed
update readme
1 parent 20c63bc commit 14035a2

1 file changed

Lines changed: 14 additions & 1 deletion

File tree

README.md

Lines changed: 14 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,7 @@
11
# BitDecoding
2+
[![arXiv](https://img.shields.io/badge/arXiv-2410.13276-b31b1b.svg)](https://arxiv.org/abs/2503.18773)
3+
[![License](https://img.shields.io/badge/License-MIT-green.svg)](LICENSE)
4+
25
BitDecoding is a high-performance, GPU-optimized system
36
designed to accelerate long-context LLMs decoding with a low-bit KV
47
cache. Acheive more than **3x speedup** than FlashDecoding-v2.
@@ -43,7 +46,17 @@ python setup.py install
4346
4447
## Citation
4548
If you find BitDecoding useful or want to use in your projects, please kindly cite our paper:
46-
49+
```
50+
@misc{du2025bitdecodingunlockingtensorcores,
51+
title={BitDecoding: Unlocking Tensor Cores for Long-Context LLMs Decoding with Low-Bit KV Cache},
52+
author={Dayou Du and Shijie Cao and Jianyi Cheng and Ting Cao and Mao Yang},
53+
year={2025},
54+
eprint={2503.18773},
55+
archivePrefix={arXiv},
56+
primaryClass={cs.AR},
57+
url={https://arxiv.org/abs/2503.18773},
58+
}
59+
```
4760
4861
## Acknowledgement
4962
BitDecoding is inspired by many open-source libraries, including (but not limited to) [flash-attention](https://github.com/Dao-AILab/flash-attention/tree/main), [flute](https://github.com/HanGuo97/flute), [Atom](https://github.com/efeslab/Atom), [omniserve](https://github.com/mit-han-lab/omniserve).

0 commit comments

Comments
 (0)