-
Notifications
You must be signed in to change notification settings - Fork 103
Description
Location
- File:
chapters/nlp-book-chapter8.pdf - Page: 57
- Section: 8.3.5.2 Attention with Non-learned Biases
Problem Description
I noticed a discrepancy in the formula for the ALiBi slopes (
The book seemingly presents the slope formula as roughly:
However, according to the ALiBi paper, for a model with
"In general, for
$n$ heads, our set of slopes is the geometric sequence that starts at$2^{\frac{-8}{n}}$ and uses that same value as its ratio."
Reasoning & Derivation
Based on the definition provided in the original paper:
-
Start term:
$2^{-8/n}$ -
Ratio:
$2^{-8/n}$ -
Head index:
$k$ ($1, \dots, n$ )
The slope
The variable
Suggested Fix
Please update the formula to reflect the correct definition:
(Where
Thank you for maintaining this excellent resource.