Skip to content

Commit bd3dd35

Browse files
JonnaMatclaude
andcommitted
Bump version to 0.1.2, fix README images for PyPI
Use absolute URLs for images so they render on PyPI. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent 8c22613 commit bd3dd35

2 files changed

Lines changed: 5 additions & 5 deletions

File tree

README.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -38,16 +38,16 @@ The dense classification head accounts for up to 60% of parameters in small LLMs
3838
The standard LM head computes a dense matrix multiplication $h_t × W_{vocab}$ at every decode step, scoring all Vocabulary tokens regardless of relevance. FlashHead reframes this as a two-stage retrieval problem over clustered token embeddings: first identify which regions of vocabulary space are relevant, then score only those candidates.
3939

4040
<p align="center">
41-
<img src="docs/flash_head_flow_diagram.svg" width="75%" />
41+
<img src="https://raw.githubusercontent.com/Embedl/flash-head/master/docs/flash_head_flow_diagram.svg" width="75%" />
4242
</p>
4343

4444
> **⚡ Key Tradeoff** A dense head scores **128,256 tokens per step** (for a 128K vocabulary). With *c = 8,016* clusters and *p = 256* probes, FlashHead scores only **8,016 + 256 × 16 = 12,112 tokens**, a <span style="color:#22c55e; font-weight:600;">10× reduction</span> in scored tokens, while multi-probe retrieval maintains near-perfect recall of the correct next token.
4545
4646

4747
<p align="center" width="100%">
48-
<img src="docs/dense_head_scoring.svg" width="30%"/>
49-
<img src="docs/arrow.svg" width="4%"/>
50-
<img src="docs/flash_head_scoring.svg" width="30%"/>
48+
<img src="https://raw.githubusercontent.com/Embedl/flash-head/master/docs/dense_head_scoring.svg" width="30%"/>
49+
<img src="https://raw.githubusercontent.com/Embedl/flash-head/master/docs/arrow.svg" width="4%"/>
50+
<img src="https://raw.githubusercontent.com/Embedl/flash-head/master/docs/flash_head_scoring.svg" width="30%"/>
5151
</p>
5252

5353
<strong>Note.</strong> The offline clustering step runs once per model and adds zero overhead at inference time.

src/flash_head/_version.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,4 +2,4 @@
22

33
"""FlashHead package version. Bump this to trigger a PyPI release."""
44

5-
__version__ = "0.1.1"
5+
__version__ = "0.1.2"

0 commit comments

Comments
 (0)