Add DeepSeek V4 prefill indexer compressor by wuzhf9 · Pull Request #384 · hw-native-sys/pypto-lib

wuzhf9 · 2026-05-26T03:42:27Z

Summary

Promote the DeepSeek V4 prefill indexer compressor from draft to a standalone kernel.
Implement B=1, S=128 ratio-4 overlap compression with projected KV/score scratch, final state writes, pooled KV cache output, RoPE, and optional Hadamard rotation.
Batch RMSNorm, RoPE, and Hadamard over all 32 compressed rows to satisfy A2/A3 tile alignment constraints.

Verification

task-submit task_20260526_111825_17581997185: python models/deepseek/v4/prefill_indexer_compressor_draft.py -p a2a3 --device 5 --enable-l2-swimlane
PASS: kv, kv_state, score_state, kv_cache

Related Issues

None

## Summary - Promote the DeepSeek V4 prefill indexer compressor from draft to a standalone kernel. - Implement B=1, S=128 ratio-4 overlap compression with projected KV/score scratch, final state writes, pooled KV cache output, RoPE, and optional Hadamard rotation. - Batch RMSNorm, RoPE, and Hadamard over all 32 compressed rows to satisfy A2/A3 tile alignment constraints. ## Verification - task-submit task_20260526_111825_17581997185: python models/deepseek/v4/prefill_indexer_compressor_draft.py -p a2a3 --device 5 --enable-l2-swimlane - PASS: kv, kv_state, score_state, kv_cache ## Related Issues None

coderabbitai · 2026-05-26T03:42:39Z

Caution

Review failed

Pull request was closed or merged during review

📝 Walkthrough

Walkthrough

Replaces a draft stub with a production DeepSeek-V4 prefill indexer compressor JIT kernel that projects inputs into key/value and score spaces, performs ratio-4 overlapping softmax pooling, applies RMSNorm and RoPE rotation, optionally applies Hadamard transform, and writes compressed results to kv_cache; includes Torch golden reference and test harness.

Changes

Kernel Implementation and Testing

Layer / File(s)	Summary
Kernel implementation and setup `models/deepseek/v4/prefill_indexer_compressor.py`, `models/deepseek/v4/prefill_indexer_compressor_draft.py`	Defines compression ratio and block parameters; implements `prefill_indexer_compressor` JIT kernel that flattens/tiles inputs, projects to KV/score spaces, writes intermediate state buffers, computes overlapping softmax-pooled KV across consecutive windows, applies RMSNorm using `norm_w`, slices and rotates heads with RoPE using cosine/sine and selection matrices, conditionally applies Hadamard vs. direct cast, and writes compressed KV to `kv_cache` at an offset derived from `start_pos`. Removes draft stub in favor of full implementation.
Test wrapper and golden reference `models/deepseek/v4/prefill_indexer_compressor.py`	Exports `prefill_indexer_compressor_test` wrapper that forwards all kernel inputs to the JIT kernel for integration with golden test harness. Adds `golden_prefill_indexer_compressor` Torch reference that reproduces the same compression overlap, RMSNorm, RoPE rotation, and optional Hadamard logic for numerical validation and comparison.
Tensor specifications and test runner `models/deepseek/v4/prefill_indexer_compressor.py`	Defines `build_tensor_specs()` factory creating `TensorSpec` and `ScalarSpec` entries for all inputs/outputs including shape, dtype, and initializer metadata. Adds `__main__` runner wiring `run_jit` with CLI options (platform, device, start-pos override, l2-swimlane toggle) and configures per-output comparison tolerances and failure behavior for numerical validation.

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

hw-native-sys/pypto-lib#270: Implements earlier DeepSeek-V4 KV compressor logic with KV/score projection, selector-matrix RoPE, optional Hadamard, ratio pooling, and state/cache updates at offsets from start_pos.
hw-native-sys/pypto-lib#346: Directly replaces the stub prefill_indexer_compressor_draft.py with the production implementation in prefill_indexer_compressor.py and updates module exports.

Poem

🐰 A kernel crystallized from drafty dreams,
With overlapping pooling, RoPE at the seams,
Golden reference dancing, Hadamard's grace—
Compression wisdom in a JIT-compiled space! ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 5.56% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly and concisely summarizes the main change: promoting and implementing the DeepSeek V4 prefill indexer compressor kernel.
Description check	✅ Passed	The description is directly related to the changeset, providing summary of promotion objectives, implementation details, and verification results.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist

Code Review

This pull request replaces the empty scaffold file with a complete implementation of the DeepSeek-V4 prefill indexer compressor for ratio-4 overlapping KV cache in models/deepseek/v4/prefill_indexer_compressor.py. The implementation includes the JIT-compiled kernel, a test wrapper, and a PyTorch golden reference. Feedback suggests replacing wildcard imports with explicit imports to prevent namespace pollution and improve code maintainability.

gemini-code-assist · 2026-05-26T03:44:29Z

+from config import FP32_NEG_INF
+from decode_indexer_compressor import *  # noqa: F401,F403


Wildcard imports (from decode_indexer_compressor import *) should be avoided as they pollute the namespace and make it difficult to track the origin of constants and variables. Additionally, FP32_NEG_INF is imported from config but never used in this file. It is highly recommended to explicitly import only the required names to improve code readability and maintainability.

Suggested change

from config import FP32_NEG_INF

from decode_indexer_compressor import * # noqa: F401,F403

from decode_indexer_compressor import (

COMPRESS_RATIO,

STATE_LEN,

OUT_DIM,

HEAD_DIM,

ROPE_HEAD_DIM,

NOPE_HEAD_DIM,

IDX_KV_LEN,

ROPE_CHUCK,

HEAD_DIM_INV,

EPS,

ROTATE,

D,

)

gemini-code-assist Bot reviewed May 26, 2026

View reviewed changes

zhangqi-chen merged commit a13fc73 into hw-native-sys:main May 26, 2026
4 of 7 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add DeepSeek V4 prefill indexer compressor#384

Add DeepSeek V4 prefill indexer compressor#384
zhangqi-chen merged 1 commit into
hw-native-sys:mainfrom
wuzhf9:dev

wuzhf9 commented May 26, 2026

Uh oh!

coderabbitai Bot commented May 26, 2026 •

edited

Loading

Review failed

Walkthrough

Changes

Possibly related PRs

Poem

❌ Failed checks (1 warning)

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot May 26, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		from config import FP32_NEG_INF
		from decode_indexer_compressor import * # noqa: F401,F403

-from config import FP32_NEG_INF
-from decode_indexer_compressor import *  # noqa: F401,F403
+from decode_indexer_compressor import (
+    COMPRESS_RATIO,
+    STATE_LEN,
+    OUT_DIM,
+    HEAD_DIM,
+    ROPE_HEAD_DIM,
+    NOPE_HEAD_DIM,
+    IDX_KV_LEN,
+    ROPE_CHUCK,
+    HEAD_DIM_INV,
+    EPS,
+    ROTATE,
+    D,
+)

Conversation

wuzhf9 commented May 26, 2026

Summary

Verification

Related Issues

Uh oh!

coderabbitai Bot commented May 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review failed

Walkthrough

Changes

Possibly related PRs

Poem

❌ Failed checks (1 warning)

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot May 26, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

coderabbitai Bot commented May 26, 2026 •

edited

Loading