Skip to content

fix(gateway): rate-limit compression warning messages to once per hour #3786

Closed
dlkakbs wants to merge 1 commit intoNousResearch:mainfrom
dlkakbs:fix/compression-warn-rate-limit
Closed

fix(gateway): rate-limit compression warning messages to once per hour #3786
dlkakbs wants to merge 1 commit intoNousResearch:mainfrom
dlkakbs:fix/compression-warn-rate-limit

Conversation

@dlkakbs
Copy link
Copy Markdown
Contributor

@dlkakbs dlkakbs commented Mar 29, 2026

What does this PR do?

The post-compression warning had no cooldown. When a session stayed above the 95% token threshold, it fired on every message — spamming users on long-running bots with no way to stop it.

The root cause was a missing rate-limit, not a missing config toggle. This PR adds a 1-hour cooldown per chat_id on GatewayRunner, gating both warning paths: "still large after compression" and "compression failed".

Why rate-limit instead of an on/off toggle? A toggle would silence a genuinely useful signal entirely, including the first occurrence. Rate-limiting preserves the warning for users who haven't seen it yet while eliminating the spam for those who have — making the config option unnecessary.

Related Issue

#3784

Type of Change

  • 🐛 Bug fix (non-breaking change that fixes an issue)
  • ✅ Tests (adding or improving test coverage)

Changes Made

  • gateway/run.py: added _compression_warn_sent dict and _compression_warn_cooldown (3600s) to GatewayRunner.init; both warn sites now check and update the rate-limit before sending
  • tests/gateway/test_session_hygiene.py: added TestCompressionWarnRateLimit with 4 tests covering first warn allowed, suppression within cooldown, allowed after cooldown, and per-chat isolation

How to Test

  1. Configure a session to stay above the 95% token threshold after compression
  2. Send multiple messages — warning should appear only once per hour, not every message
  3. pytest tests/gateway/test_session_hygiene.py -q — 23 passed

Checklist

  • Commit messages follow Conventional Commits
  • PR contains only changes related to this fix
  • All tests pass
  • Tests added for new behaviour

Documentation & Housekeeping

  • No config changes — N/A
  • Cross-platform impact considered — N/A

The post-compression warning ("Session is still very large") had no
cooldown, so it fired on every message as long as the session remained
above the 95% token threshold — spamming users on long-running bots
(Telegram, Discord, etc.).

Adds _compression_warn_sent (dict keyed by chat_id) and a 1-hour
cooldown on GatewayRunner. Both warn paths (compression ran but still
large, and compression failed) are gated by the same rate-limit.

Fixes NousResearch#3784
teknium1 added a commit that referenced this pull request Mar 30, 2026
…-limit)

Two complementary fixes for repeated context pressure warnings spamming
gateway users (Telegram, Discord, etc.):

1. Agent-level loop fix (run_agent.py):
   After compression, only reset _context_pressure_warned if the
   post-compression estimate is actually below the 85% warning level.
   Previously the flag was unconditionally reset, causing the warning
   to re-fire every loop iteration when compression couldn't reduce
   below 85% of the threshold (e.g. very low threshold like 15%,
   or system prompt alone exceeds the warning level).

2. Gateway-level rate-limit (gateway/run.py, salvaged from PR #3786):
   Per-chat_id cooldown of 1 hour on compression warning messages.
   Both warning paths ('still large after compression' and 'compression
   failed') are gated. Defense-in-depth — even if the agent-level fix
   has edge cases, users won't see more than one warning per hour.

Co-authored-by: dlkakbs <dlkakbs@users.noreply.github.com>
teknium1 added a commit that referenced this pull request Mar 30, 2026
* feat: add /yolo slash command to toggle dangerous command approvals

Adds a /yolo command that toggles HERMES_YOLO_MODE at runtime, skipping
all dangerous command approval prompts for the current session. Works in
both CLI and gateway (Telegram, Discord, etc.).

- /yolo -> ON: all commands auto-approved, no confirmation prompts
- /yolo -> OFF: normal approval flow restored

The --yolo CLI flag already existed for launch-time opt-in. This adds
the ability to toggle mid-session without restarting.

Session-scoped — resets when the process ends. Uses the existing
HERMES_YOLO_MODE env var that check_all_command_guards() already
respects.

* fix: prevent context pressure warning spam (agent loop + gateway rate-limit)

Two complementary fixes for repeated context pressure warnings spamming
gateway users (Telegram, Discord, etc.):

1. Agent-level loop fix (run_agent.py):
   After compression, only reset _context_pressure_warned if the
   post-compression estimate is actually below the 85% warning level.
   Previously the flag was unconditionally reset, causing the warning
   to re-fire every loop iteration when compression couldn't reduce
   below 85% of the threshold (e.g. very low threshold like 15%,
   or system prompt alone exceeds the warning level).

2. Gateway-level rate-limit (gateway/run.py, salvaged from PR #3786):
   Per-chat_id cooldown of 1 hour on compression warning messages.
   Both warning paths ('still large after compression' and 'compression
   failed') are gated. Defense-in-depth — even if the agent-level fix
   has edge cases, users won't see more than one warning per hour.

Co-authored-by: dlkakbs <dlkakbs@users.noreply.github.com>

---------

Co-authored-by: dlkakbs <dlkakbs@users.noreply.github.com>
@teknium1
Copy link
Copy Markdown
Contributor

Merged via PR #4012 which incorporates your gateway rate-limit alongside an agent-level loop fix. Your per-chat cooldown logic and tests were cherry-picked with authorship preserved. Thanks @dlkakbs!

@teknium1 teknium1 closed this Mar 30, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants