ep local-test by xzrderek · Pull Request #327 · eval-protocol/python-sdk

xzrderek · 2025-11-11T23:37:20Z

Note

Adds a local-test CLI to run evaluation tests locally or in Docker with extra flags, updates EvaluationRow.created_at to UTC, tweaks upload prompts, and adds comprehensive tests.

CLI:
- New local-test command in eval_protocol/cli.py and cli_commands/local_test.py to run a selected evaluation test via pytest.
  - Resolves --entry (path or path::function) or uses selector; enforces single selection.
  - Auto-detects Dockerfile; runs in Docker if present (or on host with --ignore-docker).
  - Supports --docker-build-extra and --docker-run-extra; mounts project/logs and maps user IDs.
- Command dispatch wired into main CLI.
Upload UX (cli_commands/upload.py): change prompts to say "Select this test?" and "Enter the number to select:".
Models (models.py): EvaluationRow.created_at now defaults to datetime.now(timezone.utc) (UTC timestamp).
Tests: add tests/test_cli_local_test.py covering host/Docker execution, multiple Dockerfiles error, extra flag passing, selector behavior, and path normalization.

^{Written by Cursor Bugbot for commit 9b476dc. This will update automatically on new commits. Configure here.}

eval_protocol/cli_commands/local_test.py

tests/test_cli_local_test.py

cursor · 2025-11-11T23:58:58Z

eval_protocol/cli_commands/local_test.py

+            return 1
+        if len(selected) != 1:
+            print("Error: Please select exactly one evaluation test for 'local-test'.")
+            return 1


Bug: Non-interactive --yes fails multiple tests.

When --yes is used without --entry and multiple tests exist, _prompt_select returns all tests (because non_interactive=True), causing the check if len(selected) != 1 to always fail. The error message doesn't guide users to use --entry, making the --yes flag unusable in multi-test scenarios. The function should either fail earlier with a helpful message about requiring --entry, or handle the non-interactive case differently.

xzrderek added 8 commits November 10, 2025 17:41

local test command

cd9cc91

mount for ep logs

e7615d7

update

72b9178

try to force linux/amd64

2907cf8

revert

4f1ff85

set home

99169ab

try

75d4cb6

store in utc

41b79da

cursor bot reviewed Nov 11, 2025

View reviewed changes

eval_protocol/cli_commands/local_test.py Outdated Show resolved Hide resolved

tests/test_cli_local_test.py Outdated Show resolved Hide resolved

tests/test_cli_local_test.py Outdated Show resolved Hide resolved

xzrderek added 2 commits November 11, 2025 15:43

tests

5eb5fac

fix bug

4a17784

cursor bot reviewed Nov 11, 2025

View reviewed changes

tests/test_cli_local_test.py Outdated Show resolved Hide resolved

test fix

9b476dc

xzrderek merged commit 12d6b73 into main Nov 11, 2025
2 checks passed

xzrderek deleted the derekx/local-test branch November 11, 2025 23:57

cursor bot reviewed Nov 11, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ep local-test#327

ep local-test#327
xzrderek merged 11 commits intomainfrom
derekx/local-test

xzrderek commented Nov 11, 2025 •

edited by cursor bot

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cursor bot Nov 11, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

xzrderek commented Nov 11, 2025 • edited by cursor bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cursor bot Nov 11, 2025

Choose a reason for hiding this comment

Bug: Non-interactive --yes fails multiple tests.

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

xzrderek commented Nov 11, 2025 •

edited by cursor bot

Loading

Bug: Non-interactive `--yes` fails multiple tests.