Skip to content

test(maestro-case): increase case smoke test timeout#622

Open
charlesliu9 wants to merge 1 commit intomainfrom
case-smoke-increase-timeout
Open

test(maestro-case): increase case smoke test timeout#622
charlesliu9 wants to merge 1 commit intomainfrom
case-smoke-increase-timeout

Conversation

@charlesliu9
Copy link
Copy Markdown
Contributor

No description provided.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 7, 2026

Claude finished @charlesliu9's task in 1m 47s —— View job


Coder-eval task lint (advisory)

1 task YAML changed; verdict counts: 0 Critical, 0 High, 1 Medium, 0 Low, 0 OK.

Rubric: .claude/commands/lint-task.md. This check is advisory and never blocks merge.

Evidence of passing run

High — PR body does not claim the changed tasks have been run and passed. Please edit the PR description to add a line like: Ran skill-case-init-validate locally and it passed.

Per-task lint

tests/tasks/uipath-maestro-case/init_validate.yaml — verdict: Medium

Change in this PR: adds turn_timeout: 1200 (line 14). Lint evaluates the full task file per the rubric.

Issues:

  • [Medium] Meaningful coverage: command_executed matching uip\s+maestro\s+case\s+validate (line 55) only checks the command was invoked, not that validation succeeded. run_command for caseplan.json (line 76) checks existence but not content. The test passes even if uip maestro case validate returns errors or if caseplan.json is malformed.

Suggested fixes:

  • Consider adding a run_command criterion that runs uip maestro case validate --project-dir <dir> --output json and asserts expected_exit_code: 0, or uses expected_stdout / stdout_match to confirm a passing validation message. This would verify the scaffold is actually valid, not just that the agent attempted validation.
  • Consider adding a json_check or file_contains on caseplan.json to assert it contains expected keys (e.g. "OnboardingCase", "Intake", "manual").

Within-PR duplicates

  • No duplicate clusters detected (only 1 task YAML changed).

Conclusion

  • ⚠ 1 task has issues, max severity Medium. Advisory only — not blocking merge. The timeout increase itself (turn_timeout: 1200) is a sensible addition for a smoke test that scaffolds and validates a case project.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 7, 2026

Claude finished @charlesliu9's task in 1m 3s —— View job


Review: test(maestro-case): increase case smoke test timeout

  • Read review criteria and project conventions
  • Read PR diff and full changed file
  • Check existing PR conversation
  • Perform review checks
  • Post review

Summary

Adds turn_timeout: 1200 to tests/tasks/uipath-maestro-case/init_validate.yaml, increasing the per-turn timeout from the default 900s (inherited from tests/experiments/default.yaml) to 1200s (20 minutes). This matches the timeout used by comparable complex tasks across uipath-maestro-flow and uipath-human-in-the-loop.

Change-by-Change Review

1. tests/tasks/uipath-maestro-case/init_validate.yaml — line 14

Severity: OK

The change adds a single line turn_timeout: 1200 under the agent: block at the correct indentation level. The value is reasonable and consistent with other complex smoke/e2e tasks in the repo (e.g., all uipath-maestro-flow single-node tasks, most uipath-human-in-the-loop e2e tasks use 1200). The task involves multi-phase scaffolding + validation which justifies a longer per-turn timeout than the 900s default.

What's Missing

Nothing identified. This is a minimal, well-scoped configuration change.

Area Ratings

Area Status Notes
Frontmatter N/A No skill frontmatter changed
E2E Tests OK Test config change is valid YAML, value is consistent with repo conventions
Skill Body N/A No skill docs changed
References & Assets N/A No references changed
Repo Hygiene OK Change scoped to a single test file, no secrets or unrelated changes

Issues for Manual Review

  • Confirm that 1200s is sufficient — if the test was timing out at 900s, verify that 1200s provides enough headroom. If it was cutting it close at ~850-900s, 1200s gives ~33% buffer which should be adequate.

Conclusion

Clean, minimal change. No issues found — approve.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant