Fix flaky timing-dependent security tests by Clawdy-ast · Pull Request #207 · RichardAtCT/claude-code-telegram

Clawdy-ast · 2026-05-29T04:39:41Z

Summary

Three security tests in tests/unit/test_security/ were flaky because they
depended on microsecond-precision timestamp ordering that isn't guaranteed.
All fixes are test-side only — production risk-assessment and session logic
are unchanged.

test_log_command_risk_assessment and test_log_file_access_risk_assessment
(test_audit.py): when two audit events are logged within the same
microsecond, InMemoryAuditStorage.get_events (a stable sort by timestamp,
newest-first) falls back to insertion order, so the first-inserted high-risk
event stays at events[0] instead of the most-recent low-risk one. Changed
the assertions to locate each event by filtering on its details content
(command name / file path) rather than by list position.
test_session_management (test_auth.py): refresh_session could run
in the same microsecond as authenticate_user, making
last_activity == created_at so last_activity > old_activity failed. The
test now backdates the session 1 second before refreshing, making the
comparison deterministic.

Test plan

.venv\Scripts\python.exe -m pytest tests/unit/test_security/ -q → 85 passed
The three previously-flaky tests pass when run explicitly

Add /model command with inline keyboard UI for switching between Opus/Sonnet/Haiku models and effort levels (low/medium/high/max) at runtime. Model changes force a new session since the CLI doesn't support model switching on resumed sessions. - Effort levels are model-aware: Haiku has none, Sonnet excludes "max", Opus supports all including "max" - Override is per-user via context.user_data (in-memory, resets on bot restart) - Threaded through all run_command call sites (orchestrator, classic message handler) into the SDK layer - Registered in both agentic and classic handler modes - Added to bot command menu and /help text - 17 new tests covering keyboard display, model/effort selection, label formatting, and effort-per-model configuration Closes RichardAtCT#138

Instead of just "Default", show "Default (claude-sonnet-4-6)" or "Default (CLI default)" so users can verify what model is active after resetting.

…ases - callback.py: replace shared _model_effort_handler closure with two explicit lambdas (model:/effort:) — eliminates outer-scope capture; move import to module level - command.py: drop _MODELS dict with hardcoded version IDs; use _MODEL_FAMILIES list of short CLI aliases ("opus"/"sonnet"/"haiku") which the CLI resolves to current latest automatically - command.py: add CallbackQuery + ContextTypes type annotations to _handle_model_selection (fixes mypy strict mode) - command.py: simplify _current_model_label (no reverse-map needed) - command.py: add PR RichardAtCT#165 compatibility comment on force_new_session - tests: update imports/assertions for _MODEL_FAMILIES; add 3 tests: closure regression guard, effort: prefix isolation, force_new_session Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Cumulative session cost tracking with warnings at $5 / $10 / $20 (configurable via SESSION_COST_TIERS env var). Fires once per tier per session, resets on /new or model swap. Also logs the actual model returned by Claude at turn complete — useful for verifying /model swaps on this branch. Smoke-tested 2026-05-07 with lowered thresholds; all 3 tiers fired correctly in sequence. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Three security tests relied on microsecond-precision timestamp ordering that is not guaranteed: - test_log_command_risk_assessment / test_log_file_access_risk_assessment: two audit events logged in the same microsecond fall back to insertion order in the stable sort, so the high-risk event stayed at events[0]. Now assert by filtering events on their details content, not position. - test_session_management: refresh_session could run in the same microsecond as authenticate_user, making last_activity == created_at and the `last_activity > old_activity` assertion fail. Backdate the session by 1s before refreshing so the comparison is deterministic. Production risk-assessment and session logic are unchanged; these were test-side timing bugs only. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Talla and others added 10 commits April 27, 2026 15:27

fix: show server default model in /model display

3478489

Instead of just "Default", show "Default (claude-sonnet-4-6)" or "Default (CLI default)" so users can verify what model is active after resetting.

feat(dex): add /yes /no /<custom> /now decision handlers

4f33122

feat(dex): register decision handlers in bot orchestrator

54fe187

feat(dex): echo verb meaning + project context in confirmation replies

136b5f5

feat(builder): add /builder status|kill|queue handlers + register

b39aca5

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Merge branch 'feat/builder-handlers'

42e125c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix flaky timing-dependent security tests#207

Fix flaky timing-dependent security tests#207
Clawdy-ast wants to merge 10 commits into
RichardAtCT:mainfrom
Clawdy-ast:fix/flaky-security-test-timing

Clawdy-ast commented May 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Clawdy-ast commented May 29, 2026

Summary

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants