Add DeepSeek V4 prefill indexer compressor#384
Conversation
## Summary - Promote the DeepSeek V4 prefill indexer compressor from draft to a standalone kernel. - Implement B=1, S=128 ratio-4 overlap compression with projected KV/score scratch, final state writes, pooled KV cache output, RoPE, and optional Hadamard rotation. - Batch RMSNorm, RoPE, and Hadamard over all 32 compressed rows to satisfy A2/A3 tile alignment constraints. ## Verification - task-submit task_20260526_111825_17581997185: python models/deepseek/v4/prefill_indexer_compressor_draft.py -p a2a3 --device 5 --enable-l2-swimlane - PASS: kv, kv_state, score_state, kv_cache ## Related Issues None
|
Caution Review failedPull request was closed or merged during review 📝 WalkthroughWalkthroughReplaces a draft stub with a production DeepSeek-V4 prefill indexer compressor JIT kernel that projects inputs into key/value and score spaces, performs ratio-4 overlapping softmax pooling, applies RMSNorm and RoPE rotation, optionally applies Hadamard transform, and writes compressed results to ChangesKernel Implementation and Testing
🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Code Review
This pull request replaces the empty scaffold file with a complete implementation of the DeepSeek-V4 prefill indexer compressor for ratio-4 overlapping KV cache in models/deepseek/v4/prefill_indexer_compressor.py. The implementation includes the JIT-compiled kernel, a test wrapper, and a PyTorch golden reference. Feedback suggests replacing wildcard imports with explicit imports to prevent namespace pollution and improve code maintainability.
| from config import FP32_NEG_INF | ||
| from decode_indexer_compressor import * # noqa: F401,F403 |
There was a problem hiding this comment.
Wildcard imports (from decode_indexer_compressor import *) should be avoided as they pollute the namespace and make it difficult to track the origin of constants and variables. Additionally, FP32_NEG_INF is imported from config but never used in this file. It is highly recommended to explicitly import only the required names to improve code readability and maintainability.
| from config import FP32_NEG_INF | |
| from decode_indexer_compressor import * # noqa: F401,F403 | |
| from decode_indexer_compressor import ( | |
| COMPRESS_RATIO, | |
| STATE_LEN, | |
| OUT_DIM, | |
| HEAD_DIM, | |
| ROPE_HEAD_DIM, | |
| NOPE_HEAD_DIM, | |
| IDX_KV_LEN, | |
| ROPE_CHUCK, | |
| HEAD_DIM_INV, | |
| EPS, | |
| ROTATE, | |
| D, | |
| ) |
Summary
Verification
Related Issues
None