fix(gateway): rate-limit compression warning messages to once per hour #3786
Closed
dlkakbs wants to merge 1 commit intoNousResearch:mainfrom
Closed
fix(gateway): rate-limit compression warning messages to once per hour #3786dlkakbs wants to merge 1 commit intoNousResearch:mainfrom
dlkakbs wants to merge 1 commit intoNousResearch:mainfrom
Conversation
The post-compression warning ("Session is still very large") had no
cooldown, so it fired on every message as long as the session remained
above the 95% token threshold — spamming users on long-running bots
(Telegram, Discord, etc.).
Adds _compression_warn_sent (dict keyed by chat_id) and a 1-hour
cooldown on GatewayRunner. Both warn paths (compression ran but still
large, and compression failed) are gated by the same rate-limit.
Fixes NousResearch#3784
teknium1
added a commit
that referenced
this pull request
Mar 30, 2026
…-limit) Two complementary fixes for repeated context pressure warnings spamming gateway users (Telegram, Discord, etc.): 1. Agent-level loop fix (run_agent.py): After compression, only reset _context_pressure_warned if the post-compression estimate is actually below the 85% warning level. Previously the flag was unconditionally reset, causing the warning to re-fire every loop iteration when compression couldn't reduce below 85% of the threshold (e.g. very low threshold like 15%, or system prompt alone exceeds the warning level). 2. Gateway-level rate-limit (gateway/run.py, salvaged from PR #3786): Per-chat_id cooldown of 1 hour on compression warning messages. Both warning paths ('still large after compression' and 'compression failed') are gated. Defense-in-depth — even if the agent-level fix has edge cases, users won't see more than one warning per hour. Co-authored-by: dlkakbs <dlkakbs@users.noreply.github.com>
teknium1
added a commit
that referenced
this pull request
Mar 30, 2026
* feat: add /yolo slash command to toggle dangerous command approvals Adds a /yolo command that toggles HERMES_YOLO_MODE at runtime, skipping all dangerous command approval prompts for the current session. Works in both CLI and gateway (Telegram, Discord, etc.). - /yolo -> ON: all commands auto-approved, no confirmation prompts - /yolo -> OFF: normal approval flow restored The --yolo CLI flag already existed for launch-time opt-in. This adds the ability to toggle mid-session without restarting. Session-scoped — resets when the process ends. Uses the existing HERMES_YOLO_MODE env var that check_all_command_guards() already respects. * fix: prevent context pressure warning spam (agent loop + gateway rate-limit) Two complementary fixes for repeated context pressure warnings spamming gateway users (Telegram, Discord, etc.): 1. Agent-level loop fix (run_agent.py): After compression, only reset _context_pressure_warned if the post-compression estimate is actually below the 85% warning level. Previously the flag was unconditionally reset, causing the warning to re-fire every loop iteration when compression couldn't reduce below 85% of the threshold (e.g. very low threshold like 15%, or system prompt alone exceeds the warning level). 2. Gateway-level rate-limit (gateway/run.py, salvaged from PR #3786): Per-chat_id cooldown of 1 hour on compression warning messages. Both warning paths ('still large after compression' and 'compression failed') are gated. Defense-in-depth — even if the agent-level fix has edge cases, users won't see more than one warning per hour. Co-authored-by: dlkakbs <dlkakbs@users.noreply.github.com> --------- Co-authored-by: dlkakbs <dlkakbs@users.noreply.github.com>
Contributor
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What does this PR do?
The post-compression warning had no cooldown. When a session stayed above the 95% token threshold, it fired on every message — spamming users on long-running bots with no way to stop it.
The root cause was a missing rate-limit, not a missing config toggle. This PR adds a 1-hour cooldown per chat_id on GatewayRunner, gating both warning paths: "still large after compression" and "compression failed".
Why rate-limit instead of an on/off toggle? A toggle would silence a genuinely useful signal entirely, including the first occurrence. Rate-limiting preserves the warning for users who haven't seen it yet while eliminating the spam for those who have — making the config option unnecessary.
Related Issue
#3784
Type of Change
Changes Made
How to Test
Checklist
Documentation & Housekeeping