Fix flaky timing-dependent security tests#207
Open
Clawdy-ast wants to merge 10 commits into
Open
Conversation
Add /model command with inline keyboard UI for switching between Opus/Sonnet/Haiku models and effort levels (low/medium/high/max) at runtime. Model changes force a new session since the CLI doesn't support model switching on resumed sessions. - Effort levels are model-aware: Haiku has none, Sonnet excludes "max", Opus supports all including "max" - Override is per-user via context.user_data (in-memory, resets on bot restart) - Threaded through all run_command call sites (orchestrator, classic message handler) into the SDK layer - Registered in both agentic and classic handler modes - Added to bot command menu and /help text - 17 new tests covering keyboard display, model/effort selection, label formatting, and effort-per-model configuration Closes RichardAtCT#138
Instead of just "Default", show "Default (claude-sonnet-4-6)" or "Default (CLI default)" so users can verify what model is active after resetting.
…ases
- callback.py: replace shared _model_effort_handler closure with two
explicit lambdas (model:/effort:) — eliminates outer-scope capture;
move import to module level
- command.py: drop _MODELS dict with hardcoded version IDs; use
_MODEL_FAMILIES list of short CLI aliases ("opus"/"sonnet"/"haiku")
which the CLI resolves to current latest automatically
- command.py: add CallbackQuery + ContextTypes type annotations to
_handle_model_selection (fixes mypy strict mode)
- command.py: simplify _current_model_label (no reverse-map needed)
- command.py: add PR RichardAtCT#165 compatibility comment on force_new_session
- tests: update imports/assertions for _MODEL_FAMILIES; add 3 tests:
closure regression guard, effort: prefix isolation, force_new_session
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Cumulative session cost tracking with warnings at $5 / $10 / $20 (configurable via SESSION_COST_TIERS env var). Fires once per tier per session, resets on /new or model swap. Also logs the actual model returned by Claude at turn complete — useful for verifying /model swaps on this branch. Smoke-tested 2026-05-07 with lowered thresholds; all 3 tiers fired correctly in sequence. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three security tests relied on microsecond-precision timestamp ordering that is not guaranteed: - test_log_command_risk_assessment / test_log_file_access_risk_assessment: two audit events logged in the same microsecond fall back to insertion order in the stable sort, so the high-risk event stayed at events[0]. Now assert by filtering events on their details content, not position. - test_session_management: refresh_session could run in the same microsecond as authenticate_user, making last_activity == created_at and the `last_activity > old_activity` assertion fail. Backdate the session by 1s before refreshing so the comparison is deterministic. Production risk-assessment and session logic are unchanged; these were test-side timing bugs only. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Three security tests in
tests/unit/test_security/were flaky because theydepended on microsecond-precision timestamp ordering that isn't guaranteed.
All fixes are test-side only — production risk-assessment and session logic
are unchanged.
test_log_command_risk_assessmentandtest_log_file_access_risk_assessment(
test_audit.py): when two audit events are logged within the samemicrosecond,
InMemoryAuditStorage.get_events(a stable sort by timestamp,newest-first) falls back to insertion order, so the first-inserted high-risk
event stays at
events[0]instead of the most-recent low-risk one. Changedthe assertions to locate each event by filtering on its
detailscontent(command name / file path) rather than by list position.
test_session_management(test_auth.py):refresh_sessioncould runin the same microsecond as
authenticate_user, makinglast_activity == created_atsolast_activity > old_activityfailed. Thetest now backdates the session 1 second before refreshing, making the
comparison deterministic.
Test plan
.venv\Scripts\python.exe -m pytest tests/unit/test_security/ -q→ 85 passed