⚡ Bolt: Optimize Typical Sampling by eliminating redundant logarithm computations#3491
⚡ Bolt: Optimize Typical Sampling by eliminating redundant logarithm computations#3491EffortlessSteven wants to merge 1 commit intomainfrom
Conversation
|
👋 Jules, reporting for duty! I'm here to lend a hand with this pull request. When you start a review, I'll add a 👀 emoji to each comment to let you know I've read it. I'll focus on feedback directed at me and will do my best to stay out of conversations between you and other bots or reviewers to keep the noise down. I'll push a commit with your requested changes shortly after. Please note there might be a delay between these steps, but rest assured I'm on the job! For more direct control, you can switch me to Reactive Mode. When this mode is on, I will only act on comments where you specifically mention me with New to Jules? Learn more at jules.google/docs. For security, I will only act on instructions from the user who triggered this task. |
|
Important Review skippedDraft detected. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Code Review
This pull request optimizes the apply_typical function by fusing logarithmic calculations, reducing the number of expensive .ln() calls from two to one per token. The changes also include a new entry in the learning log documenting this performance improvement. Feedback was provided to correct the date in the new learning log entry, as it is currently set in the future.
| **Learning:** Allocating memory in the hot path of LLM token generation (e.g., `logits.to_vec()` or creating `HashMap`s per token) significantly degrades performance due to repeated allocation overhead of vocabulary-sized vectors (often 128K+ elements). Additionally, mathematically equivalent iterative multiplication (`logit *= inv_penalty`) can replace `HashMap` counting and `.powi(count)`, completely eliminating O(N) memory allocations per token. | ||
| **Action:** When working on generation loops, use buffer pooling (e.g. storing a `Vec` in the generator state and using `std::mem::take` to bypass borrow checker limitations) and avoid `HashMap` allocations for simple counting if an iterative scalar approach is mathematically equivalent. | ||
|
|
||
| ## 2025-04-02 - Fusing Logarithmic Calculations in Typical Sampling |
There was a problem hiding this comment.
💡 What: Fused the computation of expected surprise (entropy) and point surprise in
apply_typicalto only require a single.ln()evaluation per non-zero token instead of two.🎯 Why:
p.ln()is a computationally expensive transcendental function. In typical sampling, it was being calculated twice per valid token (once to sum entropy, once to find the deviation). This halves the required expensive math operations and removes an entire intermediate array mapping step.📊 Impact: Reduces transcendental function calls by 50% and removes O(n) mapping passes during typical sampling probability filtering.
🔬 Measurement: Verified with
cargo test -p bitnet-logits-filtersand formatting checks.PR created automatically by Jules for task 16462937900351441391 started by @EffortlessSteven