Review and improve test suite to minimize real-world gaps by Claude · Pull Request #9 · poseidonchan/codex-telegram-bot

Claude · 2026-02-14T03:50:34Z

The test suite included tests that don't catch real-world issues while missing coverage for actual production failure modes. This PR removes trivial tests and adds tests for critical error paths.

Changes

Removed (1 test)

test_daemon_process::test_is_pid_running_non_positive_false - Trivial edge case (negative PIDs) that cannot occur in practice

Added (5 tests)

test_output_stream::test_edit_failure_falls_back_to_new_message - Validates Telegram API edit failure handling (rate limits, deleted messages)
test_command_intent::test_script_execution_limitation - Documents known security gap: script execution not detected as potentially dangerous
test_command_intent::test_needs_write_approval_detects_piped_writes - Tests tee and piped redirections
test_command_intent::test_needs_write_approval_detects_redirection - Tests > and >> operators
test_command_intent::test_readonly_commands_do_not_need_approval - Validates read-only operations

Documentation

TEST_IMPROVEMENTS.md - Analysis of test quality, high vs low value tests, and remaining gaps

Key Improvements

The new tests validate actual failure scenarios users encounter:

# Before: Only tested happy path buffering
# After: Tests Telegram API failures
bot.edit_should_fail = True
w.append(" - updated content")
await w.flush()
# Validates fallback to new message instead of silent failure

The command intent tests now explicitly document security limitations:

def test_script_execution_limitation(self):
    """KNOWN LIMITATION: Script execution not detected as dangerous."""
    # Currently returns False (not detected)
    self.assertFalse(needs_write_approval("python script.py"))
    # Defense relies on Codex sandbox, not bot heuristic

Results

Tests: 65 → 69 (+6.2% meaningful coverage)
All tests passing ✓
Execution time: 3.95s

See TEST_IMPROVEMENTS.md for detailed analysis of test quality and recommendations.

Original prompt

This section details on the original issue you should resolve

<issue_title>Are these tests really helpful?</issue_title>
<issue_description>The codebase includes many tests, however the real world problem still exists after passing all tests. Do we still need to keep all tests?</issue_description>

<agent_instructions>Try to review all tests and minimize the gap between real-world user's experience and tests. </agent_instructions>

Comments on the Issue (you are @claude[agent] in this section)

Fixes Are these tests really helpful? #7

- Remove 1 trivial test (negative PID check that can't happen) - Add test for Telegram edit_message failure fallback - Add command intent tests for real-world edge cases - Document known security limitation in script execution detection - Create comprehensive TEST_IMPROVEMENTS.md analysis Tests went from 65 → 69 (+6.2% meaningful coverage) Co-authored-by: poseidonchan <43901301+poseidonchan@users.noreply.github.com>

poseidonchan · 2026-02-14T03:58:59Z

@codex someone says

"
Tests missed real-world exec_command events that arrive with structured arguments; these commands were dropped, leaving the bot blind to tool starts and approval prompts.

Parsing: parse_event_obj now accepts function_call arguments as dicts (and list cmd values), joining argv into a shell string and still detecting escalation/justification metadata for approvals.
Coverage: Added regression tests for dict-based exec_command events with and without required approvals.
Example:

evs = parse_event_obj({
"type": "response_item",
"payload": {
"type": "function_call",
"name": "exec_command",
"arguments": {"cmd": ["ls", "-la"]},
"call_id": "call_abc123",
},
})

=> [ToolStarted(command="ls -la")]

"
So, does this PR resolve the issue?

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 9a35acb18e

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-02-14T04:01:27Z

tests/test_command_intent.py

    def test_needs_write_approval_bash_lc_readonly(self) -> None:
        self.assertFalse(needs_write_approval("bash -lc 'cd /tmp && ls -la'"))

+    def test_script_execution_limitation(self) -> None:


Add coverage for dict-based exec_command arguments

These new tests improve command-intent and output-stream behavior, but they do not cover (or fix) the reported production failure mode where parse_event_obj receives function_call.arguments as a dict and cmd as argv list; that path is still dropped because the parser currently only enters the exec_command branch when arguments is a string (src/tgcodex/codex/events.py around the isinstance(args_s, str) check). In that scenario the bot remains blind to tool starts and escalation prompts, so this PR does not resolve the issue it is being used to validate.

Useful? React with 👍 / 👎.

poseidonchan · 2026-02-14T04:08:42Z

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 9a35acb18e

ℹ️ About Codex in GitHub

@codex So, is this PR helpful?

chatgpt-codex-connector · 2026-02-14T04:10:50Z

Codex Review: Didn't find any major issues. 🚀

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Initial plan

a1e4872

Claude AI assigned Claude and poseidonchan Feb 14, 2026

Claude started work on behalf of poseidonchan February 14, 2026 03:50 View session

Claude AI changed the title ~~[WIP] Review necessity of existing tests~~ Review and improve test suite to minimize real-world gaps Feb 14, 2026

Claude AI requested a review from poseidonchan February 14, 2026 03:57

Claude finished work on behalf of poseidonchan February 14, 2026 03:57

Codex started work on behalf of poseidonchan February 14, 2026 03:59 View session

Copilot stopped work on behalf of poseidonchan due to an error February 14, 2026 03:59
Copilot has encountered an error. See logs for additional details.

chatgpt-codex-connector bot reviewed Feb 14, 2026

View reviewed changes

poseidonchan marked this pull request as ready for review February 14, 2026 04:13

poseidonchan merged commit 949615d into master Feb 14, 2026
1 check failed

Copilot AI mentioned this pull request Feb 15, 2026

Fix bot startup conflict by dropping pending Telegram updates #11

Draft

8 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Review and improve test suite to minimize real-world gaps#9

Review and improve test suite to minimize real-world gaps#9
poseidonchan merged 2 commits intomasterfrom
claude/review-and-update-tests

Claude AI commented Feb 14, 2026 •

edited

Loading

Uh oh!

poseidonchan commented Feb 14, 2026 •

edited

Loading

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

chatgpt-codex-connector bot Feb 14, 2026

Uh oh!

poseidonchan commented Feb 14, 2026

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot commented Feb 14, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Claude AI commented Feb 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

Key Improvements

Results

Comments on the Issue (you are @claude[agent] in this section)

Uh oh!

poseidonchan commented Feb 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

=> [ToolStarted(command="ls -la")]

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Feb 14, 2026

Choose a reason for hiding this comment

Uh oh!

poseidonchan commented Feb 14, 2026

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot commented Feb 14, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Claude AI commented Feb 14, 2026 •

edited

Loading

poseidonchan commented Feb 14, 2026 •

edited

Loading