Skip to content

[Chapter 8] Correction regarding the ALiBi slope calculation formula (Page 57) #9

@Sameta-cani

Description

@Sameta-cani

Location

  • File: chapters/nlp-book-chapter8.pdf
  • Page: 57
  • Section: 8.3.5.2 Attention with Non-learned Biases

Problem Description
I noticed a discrepancy in the formula for the ALiBi slopes ($m$) presented in the book compared to the original paper (Press et al., 2022).

The book seemingly presents the slope formula as roughly:
$$\beta_k = \frac{1}{2^{\frac{8}{k}}} \quad (\text{or similar incorrect form where } k \text{ is in the denominator})$$

However, according to the ALiBi paper, for a model with $n$ heads, the slopes form a geometric sequence:

"In general, for $n$ heads, our set of slopes is the geometric sequence that starts at $2^{\frac{-8}{n}}$ and uses that same value as its ratio."

Reasoning & Derivation
Based on the definition provided in the original paper:

  • Start term: $2^{-8/n}$
  • Ratio: $2^{-8/n}$
  • Head index: $k$ ($1, \dots, n$)

The slope $m_k$ for the $k$-th head should be derived as:

$$ \begin{align} m_k &= (\text{Start}) \times (\text{Ratio})^{k-1} \quad \text{... if utilizing standard term indexing} \\ \text{OR directly as described: } \quad m_k &= (2^{-8/n})^k \\ &= 2^{-\frac{8k}{n}} \\ &= \frac{1}{2^{\frac{8k}{n}}} \end{align} $$

The variable $k$ (head index) must be in the numerator of the exponent to ensure the slopes properly interpolate the range.

Suggested Fix
Please update the formula to reflect the correct definition:

$$ m_k = \frac{1}{2^{\frac{8 \cdot k}{n}}} $$

(Where $n$ is the total number of heads and $k$ is the current head index)

Thank you for maintaining this excellent resource.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions