fix(gateway): rate-limit compression warning messages to once per hour by dlkakbs · Pull Request #3786 · NousResearch/hermes-agent

dlkakbs · 2026-03-29T20:36:24Z

What does this PR do?

The post-compression warning had no cooldown. When a session stayed above the 95% token threshold, it fired on every message — spamming users on long-running bots with no way to stop it.

The root cause was a missing rate-limit, not a missing config toggle. This PR adds a 1-hour cooldown per chat_id on GatewayRunner, gating both warning paths: "still large after compression" and "compression failed".

Why rate-limit instead of an on/off toggle? A toggle would silence a genuinely useful signal entirely, including the first occurrence. Rate-limiting preserves the warning for users who haven't seen it yet while eliminating the spam for those who have — making the config option unnecessary.

Related Issue

#3784

Type of Change

🐛 Bug fix (non-breaking change that fixes an issue)
✅ Tests (adding or improving test coverage)

Changes Made

gateway/run.py: added _compression_warn_sent dict and _compression_warn_cooldown (3600s) to GatewayRunner.init; both warn sites now check and update the rate-limit before sending
tests/gateway/test_session_hygiene.py: added TestCompressionWarnRateLimit with 4 tests covering first warn allowed, suppression within cooldown, allowed after cooldown, and per-chat isolation

How to Test

Configure a session to stay above the 95% token threshold after compression
Send multiple messages — warning should appear only once per hour, not every message
pytest tests/gateway/test_session_hygiene.py -q — 23 passed

Checklist

Commit messages follow Conventional Commits
PR contains only changes related to this fix
All tests pass
Tests added for new behaviour

Documentation & Housekeeping

No config changes — N/A
Cross-platform impact considered — N/A

The post-compression warning ("Session is still very large") had no cooldown, so it fired on every message as long as the session remained above the 95% token threshold — spamming users on long-running bots (Telegram, Discord, etc.). Adds _compression_warn_sent (dict keyed by chat_id) and a 1-hour cooldown on GatewayRunner. Both warn paths (compression ran but still large, and compression failed) are gated by the same rate-limit. Fixes NousResearch#3784

…-limit) Two complementary fixes for repeated context pressure warnings spamming gateway users (Telegram, Discord, etc.): 1. Agent-level loop fix (run_agent.py): After compression, only reset _context_pressure_warned if the post-compression estimate is actually below the 85% warning level. Previously the flag was unconditionally reset, causing the warning to re-fire every loop iteration when compression couldn't reduce below 85% of the threshold (e.g. very low threshold like 15%, or system prompt alone exceeds the warning level). 2. Gateway-level rate-limit (gateway/run.py, salvaged from PR #3786): Per-chat_id cooldown of 1 hour on compression warning messages. Both warning paths ('still large after compression' and 'compression failed') are gated. Defense-in-depth — even if the agent-level fix has edge cases, users won't see more than one warning per hour. Co-authored-by: dlkakbs <dlkakbs@users.noreply.github.com>

* feat: add /yolo slash command to toggle dangerous command approvals Adds a /yolo command that toggles HERMES_YOLO_MODE at runtime, skipping all dangerous command approval prompts for the current session. Works in both CLI and gateway (Telegram, Discord, etc.). - /yolo -> ON: all commands auto-approved, no confirmation prompts - /yolo -> OFF: normal approval flow restored The --yolo CLI flag already existed for launch-time opt-in. This adds the ability to toggle mid-session without restarting. Session-scoped — resets when the process ends. Uses the existing HERMES_YOLO_MODE env var that check_all_command_guards() already respects. * fix: prevent context pressure warning spam (agent loop + gateway rate-limit) Two complementary fixes for repeated context pressure warnings spamming gateway users (Telegram, Discord, etc.): 1. Agent-level loop fix (run_agent.py): After compression, only reset _context_pressure_warned if the post-compression estimate is actually below the 85% warning level. Previously the flag was unconditionally reset, causing the warning to re-fire every loop iteration when compression couldn't reduce below 85% of the threshold (e.g. very low threshold like 15%, or system prompt alone exceeds the warning level). 2. Gateway-level rate-limit (gateway/run.py, salvaged from PR #3786): Per-chat_id cooldown of 1 hour on compression warning messages. Both warning paths ('still large after compression' and 'compression failed') are gated. Defense-in-depth — even if the agent-level fix has edge cases, users won't see more than one warning per hour. Co-authored-by: dlkakbs <dlkakbs@users.noreply.github.com> --------- Co-authored-by: dlkakbs <dlkakbs@users.noreply.github.com>

teknium1 · 2026-03-30T20:18:36Z

Merged via PR #4012 which incorporates your gateway rate-limit alongside an agent-level loop fix. Your per-chat cooldown logic and tests were cherry-picked with authorship preserved. Thanks @dlkakbs!

teknium1 closed this Mar 30, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(gateway): rate-limit compression warning messages to once per hour #3786

fix(gateway): rate-limit compression warning messages to once per hour #3786
dlkakbs wants to merge 1 commit intoNousResearch:mainfrom
dlkakbs:fix/compression-warn-rate-limit

dlkakbs commented Mar 29, 2026 •

edited

Loading

Uh oh!

teknium1 commented Mar 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

dlkakbs commented Mar 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

teknium1 commented Mar 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

dlkakbs commented Mar 29, 2026 •

edited

Loading