|
1 | 1 | # RSR-core |
2 | 2 |
|
3 | | -**RSR (Redundant Segment Reduction)** algorithm. |
| 3 | +**RSR (Redundant Segment Reduction)** for efficient low-bit inference (matrix-vector multiplication). |
4 | 4 |
|
5 | | -Reference: [UIC-InDeXLab/RSR](https://github.com/UIC-InDeXLab/RSR) |
| 5 | +This repository contains the core kernels, model integrations, and benchmarking code for **RSR** across CPU and CUDA backends. RSR targets fast matrix-vector multiplication when the matrix is low-bit quantized by grouping repeated column patterns, aggregating the corresponding input values once, and then scattering the result to the affected output rows. |
6 | 6 |
|
7 | | -## Installation |
| 7 | +This is especially useful for workloads such as low-bit LLM inference, where decoding repeatedly applies quantized matvec operations. For the original algorithm, see [UIC-InDeXLab/RSR](https://github.com/UIC-InDeXLab/RSR) and [docs/ALGORITHM.md](docs/ALGORITHM.md). |
8 | 8 |
|
9 | | -**Prerequisites:** Python >= 3.10, a C compiler (for CPU kernels), and optionally CUDA for GPU support. |
| 9 | +## Installation 🛠️ |
| 10 | + |
| 11 | +**Prerequisites:** Python >= 3.10, a C compiler for CPU kernels, and optionally CUDA for GPU support. |
10 | 12 |
|
11 | 13 | ```bash |
12 | 14 | git clone https://github.com/UIC-InDeXLab/RSR-Core.git |
13 | 15 | cd RSR-Core |
14 | 16 | pip install -e . |
15 | 17 | ``` |
16 | 18 |
|
17 | | -## Structure |
18 | | - |
19 | | -``` |
20 | | -RSR-core/ |
21 | | -├── multiplier/ # Python wrappers for kernels |
22 | | -│ ├── bit_1/ # 1-bit (binary) multipliers (CPU/CUDA) |
23 | | -│ └── bit_1_58/ # 1.58-bit (ternary) multipliers (CPU/CUDA) |
24 | | -├── kernels/ # Low-level C/CUDA kernel source |
25 | | -│ ├── bit_1/ |
26 | | -│ │ ├── cpu/ # C kernels |
27 | | -│ │ └── cuda/ # CUDA kernels (.cu) |
28 | | -│ └── bit_1_58/ |
29 | | -│ ├── cpu/ # C kernels |
30 | | -│ └── cuda/ # CUDA kernels (.cu) |
31 | | -├── integrations/ # Model integrations |
32 | | -│ └── hf/ # HuggingFace integration |
33 | | -├── benchmarking/ # Benchmarking scripts & results |
34 | | -└── tests/ # Unit and integration tests |
35 | | -``` |
36 | | - |
37 | | - |
38 | | -## Demo |
| 19 | +## Demo 🎬 |
| 20 | +Inference on CPU for a 1.58-bit LLM decoding step. Click the image to view the original high-quality video. `HF` denotes the Hugging Face baseline running `bfloat16` on PyTorch. |
39 | 21 |
|
40 | | -<!-- <p align="center"> |
41 | | - <a href="assets/rsr_baseline_compare.mp4"> |
42 | | - <img src="assets/rsr_baseline_compare.webp" alt="Comparison of the Hugging Face baseline and RSR inference on 1.58-bit LLM inference. Click to open the MP4 version." width="900" /> |
43 | | - </a> |
44 | | -</p> --> |
| 22 | +`PROMPT: "Write the numbers from one to sixty in words separated by commas only:"` |
45 | 23 |
|
46 | 24 | [](https://drive.google.com/file/d/1ub-MITJUepmfBLkyUZFb50hbJsuhgwCH/view?usp=sharing) |
47 | 25 |
|
48 | | -## Benchmark Results |
| 26 | +## Benchmark Results 📊 |
49 | 27 |
|
50 | 28 | ### Matrix-Vector Multiplication |
51 | 29 |
|
@@ -82,3 +60,29 @@ Speedup is computed against the HuggingFace `bfloat16` baseline for the same mod |
82 | 60 | | Llama3-8B-1.58-100B-tokens | 31.9 | **59.3** | **1.9x** | |
83 | 61 | | bitnet-b1.58-2B-4T-bf16 | 33.1 | **57.4** | **1.7x** | |
84 | 62 | | bitnet-b1.58-2B-4T | 41.6 | **57.1** | **1.4x** | |
| 63 | + |
| 64 | +## Updates 📝 |
| 65 | + |
| 66 | +<!-- |
| 67 | +- Add project updates here. |
| 68 | +--> |
| 69 | + |
| 70 | +## Project Structure 🗂️ |
| 71 | + |
| 72 | +```text |
| 73 | +RSR-core/ |
| 74 | +├── multiplier/ # Python wrappers for kernels |
| 75 | +│ ├── bit_1/ # 1-bit (binary) multipliers (CPU/CUDA) |
| 76 | +│ └── bit_1_58/ # 1.58-bit (ternary) multipliers (CPU/CUDA) |
| 77 | +├── kernels/ # Low-level C/CUDA kernel source |
| 78 | +│ ├── bit_1/ |
| 79 | +│ │ ├── cpu/ # C kernels |
| 80 | +│ │ └── cuda/ # CUDA kernels (.cu) |
| 81 | +│ └── bit_1_58/ |
| 82 | +│ ├── cpu/ # C kernels |
| 83 | +│ └── cuda/ # CUDA kernels (.cu) |
| 84 | +├── integrations/ # Model integrations |
| 85 | +│ └── hf/ # HuggingFace integration |
| 86 | +├── benchmarking/ # Benchmarking scripts & results |
| 87 | +└── tests/ # Unit and integration tests |
| 88 | +``` |
0 commit comments