[Chapter 8] Correction regarding the ALiBi slope calculation formula (Page 57)

**Location**
* **File:** `chapters/nlp-book-chapter8.pdf`
* **Page:** 57
* **Section:** 8.3.5.2 Attention with Non-learned Biases

**Problem Description**
I noticed a discrepancy in the formula for the ALiBi slopes ($m$) presented in the book compared to the original paper (*Press et al., 2022*).

The book seemingly presents the slope formula as roughly:
$$\beta_k = \frac{1}{2^{\frac{8}{k}}} \quad (\text{or similar incorrect form where } k \text{ is in the denominator})$$

However, according to the ALiBi paper, for a model with $n$ heads, the slopes form a **geometric sequence**:
> "In general, for $n$ heads, our set of slopes is the geometric sequence that starts at $2^{\frac{-8}{n}}$ and uses that same value as its ratio."

**Reasoning & Derivation**
Based on the definition provided in the original paper:
* **Start term:** $2^{-8/n}$
* **Ratio:** $2^{-8/n}$
* **Head index:** $k$ ($1, \dots, n$)

The slope $m_k$ for the $k$-th head should be derived as:

$$
\begin{align}
m_k &= (\text{Start}) \times (\text{Ratio})^{k-1} \quad \text{... if utilizing standard term indexing} \\
\text{OR directly as described: } \quad m_k &= (2^{-8/n})^k \\
&= 2^{-\frac{8k}{n}} \\
&= \frac{1}{2^{\frac{8k}{n}}}
\end{align}
$$

The variable $k$ (head index) must be in the **numerator** of the exponent to ensure the slopes properly interpolate the range.

**Suggested Fix**
Please update the formula to reflect the correct definition:

$$
m_k = \frac{1}{2^{\frac{8 \cdot k}{n}}}
$$

(Where $n$ is the total number of heads and $k$ is the current head index)

Thank you for maintaining this excellent resource.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Chapter 8] Correction regarding the ALiBi slope calculation formula (Page 57) #9

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Chapter 8] Correction regarding the ALiBi slope calculation formula (Page 57) #9

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions