Skip to content

ep local-test#327

Merged
xzrderek merged 11 commits intomainfrom
derekx/local-test
Nov 11, 2025
Merged

ep local-test#327
xzrderek merged 11 commits intomainfrom
derekx/local-test

Conversation

@xzrderek
Copy link
Contributor

@xzrderek xzrderek commented Nov 11, 2025

Note

Adds a local-test CLI to run evaluation tests locally or in Docker with extra flags, updates EvaluationRow.created_at to UTC, tweaks upload prompts, and adds comprehensive tests.

  • CLI:
    • New local-test command in eval_protocol/cli.py and cli_commands/local_test.py to run a selected evaluation test via pytest.
      • Resolves --entry (path or path::function) or uses selector; enforces single selection.
      • Auto-detects Dockerfile; runs in Docker if present (or on host with --ignore-docker).
      • Supports --docker-build-extra and --docker-run-extra; mounts project/logs and maps user IDs.
    • Command dispatch wired into main CLI.
  • Upload UX (cli_commands/upload.py): change prompts to say "Select this test?" and "Enter the number to select:".
  • Models (models.py): EvaluationRow.created_at now defaults to datetime.now(timezone.utc) (UTC timestamp).
  • Tests: add tests/test_cli_local_test.py covering host/Docker execution, multiple Dockerfiles error, extra flag passing, selector behavior, and path normalization.

Written by Cursor Bugbot for commit 9b476dc. This will update automatically on new commits. Configure here.

@xzrderek xzrderek merged commit 12d6b73 into main Nov 11, 2025
2 checks passed
@xzrderek xzrderek deleted the derekx/local-test branch November 11, 2025 23:57
return 1
if len(selected) != 1:
print("Error: Please select exactly one evaluation test for 'local-test'.")
return 1
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Non-interactive --yes fails multiple tests.

When --yes is used without --entry and multiple tests exist, _prompt_select returns all tests (because non_interactive=True), causing the check if len(selected) != 1 to always fail. The error message doesn't guide users to use --entry, making the --yes flag unusable in multi-test scenarios. The function should either fail earlier with a helpful message about requiring --entry, or handle the non-interactive case differently.

Fix in Cursor Fix in Web

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant