Skip to content

Commit 8dd55df

Browse files
committed
update readme
1 parent 7c2de53 commit 8dd55df

2 files changed

Lines changed: 28 additions & 12 deletions

File tree

README.md

Lines changed: 22 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,14 @@
1-
# BitLadder
1+
# [HPCA 2025] BitDecoding
2+
[![arXiv](https://img.shields.io/badge/arXiv-2410.13276-b31b1b.svg)](https://arxiv.org/abs/2503.18773)
23
[![License](https://img.shields.io/badge/License-MIT-green.svg)](LICENSE)
34

4-
BitLadder is a high-performance, GPU-optimized system
5+
BitDecoding is a high-performance, GPU-optimized system
56
designed to accelerate long-context LLMs decoding with a low-bit KV
67
cache. Achieve **3-9x speedup** than Flash Attention-v2.
78
![overview](imgs/overview.png)
8-
![scheme](imgs/scheme.png)
9+
10+
## News
11+
* [2025.11] 🔥 BitDecoding has been accepted to HPCA 2025!
912

1013
## Benchmark
1114
* Kernel Performance in RTX4090
@@ -15,9 +18,9 @@ cache. Achieve **3-9x speedup** than Flash Attention-v2.
1518

1619
## Installation
1720
```
18-
git clone --recursive https://github.com/DD-DuDa/BitLadder.git
19-
conda create -n bitladder python=3.10
20-
conda activate bitladder
21+
git clone --recursive https://github.com/DD-DuDa/BitDecoding.git
22+
conda create -n bitdecode python=3.10
23+
conda activate bitdecode
2124
pip install -r requirements.txt
2225
python setup.py install
2326
```
@@ -35,6 +38,19 @@ python setup.py install
3538
```
3639
3. End2end inference example, please see [e2e](https://github.com/DD-DuDa/BitDecoding/tree/e2e)
3740
41+
## Citation
42+
If you find BitDecoding useful or want to use in your projects, please kindly cite our paper:
43+
```
44+
@misc{du2025bitdecodingunlockingtensorcores,
45+
title={BitDecoding: Unlocking Tensor Cores for Long-Context LLMs with Low-Bit KV Cache},
46+
author={Dayou Du and Shijie Cao and Jianyi Cheng and Luo Mai and Ting Cao and Mao Yang},
47+
year={2025},
48+
eprint={2503.18773},
49+
archivePrefix={arXiv},
50+
primaryClass={cs.AR},
51+
url={https://arxiv.org/abs/2503.18773},
52+
}
53+
```
3854
3955
## Acknowledgement
4056
BitLadder is inspired by many open-source libraries, including (but not limited to) [flash-attention](https://github.com/Dao-AILab/flash-attention/tree/main), [flute](https://github.com/HanGuo97/flute), [Atom](https://github.com/efeslab/Atom), [omniserve](https://github.com/mit-han-lab/omniserve), [KIVI](https://github.com/jy-yuan/KIVI).

benchmark/bench_single_decode.ipynb

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
"cells": [
33
{
44
"cell_type": "code",
5-
"execution_count": 1,
5+
"execution_count": 2,
66
"metadata": {},
77
"outputs": [],
88
"source": [
@@ -13,7 +13,7 @@
1313
"from einops import rearrange, repeat\n",
1414
"import numpy as np\n",
1515
"\n",
16-
"from flash_attn import flash_attn_with_kvcache\n",
16+
"# from flash_attn import flash_attn_with_kvcache\n",
1717
"from bit_decode import kvcache_pack_int, fwd_kvcache_int"
1818
]
1919
},
@@ -26,7 +26,7 @@
2626
},
2727
{
2828
"cell_type": "code",
29-
"execution_count": 2,
29+
"execution_count": 3,
3030
"metadata": {},
3131
"outputs": [],
3232
"source": [
@@ -59,7 +59,7 @@
5959
},
6060
{
6161
"cell_type": "code",
62-
"execution_count": 3,
62+
"execution_count": 4,
6363
"metadata": {},
6464
"outputs": [],
6565
"source": [
@@ -82,7 +82,7 @@
8282
},
8383
{
8484
"cell_type": "code",
85-
"execution_count": 4,
85+
"execution_count": 5,
8686
"metadata": {},
8787
"outputs": [
8888
{
@@ -392,7 +392,7 @@
392392
],
393393
"metadata": {
394394
"kernelspec": {
395-
"display_name": "bitdecode",
395+
"display_name": "vllm",
396396
"language": "python",
397397
"name": "python3"
398398
},

0 commit comments

Comments
 (0)