Skip to content

revert test#559

Draft
xyao-nv wants to merge 7 commits intoxyao/exp/ci_stallfrom
xyao/exp/revert_eval_runner_tests
Draft

revert test#559
xyao-nv wants to merge 7 commits intoxyao/exp/ci_stallfrom
xyao/exp/revert_eval_runner_tests

Conversation

@xyao-nv
Copy link
Copy Markdown
Collaborator

@xyao-nv xyao-nv commented Apr 8, 2026

Summary

Short description of the change (max 50 chars)

Detailed description

  • What was the reason for the change?
  • What has been changed?
  • What is the impact of this change?

@xyao-nv xyao-nv force-pushed the xyao/exp/ci_stall branch 2 times, most recently from bc9bb9d to e80dca2 Compare April 8, 2026 23:12
Copy link
Copy Markdown

@kellyguo11 kellyguo11 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review: PR #559 — revert test

Summary: This PR re-enables 5 eval runner tests that were previously skipped with @pytest.mark.skip(reason="Skipping because of CI stalling") by replacing those skip markers with @pytest.mark.with_subprocess.

Context understood: The base branch xyao/exp/ci_stall added infrastructure to separate subprocess-spawning tests from persistent SimulationApp tests (via the with_subprocess marker and CI workflow changes). This PR is the natural follow-up that actually reverts the skips and uses the new marker.

✅ What looks good

  • The change is straightforward: 5 mechanical replacements of skipwith_subprocess.
  • The with_subprocess marker is properly registered in pytest.ini and the CI workflow already has a dedicated step that runs -m with_subprocess tests with ISAACLAB_ARENA_SUBPROCESS_TIMEOUT=900.
  • The marker semantics are correct — these tests all use run_eval_runner_and_check_no_failures(), which spawns eval_runner.py via subprocess.run().

⚠️ Potential concern: missing timeout / process group isolation in run_eval_runner_and_check_no_failures

The utility run_subprocess() in isaaclab_arena/tests/utils/subprocess.py was carefully written with:

  • start_new_session=True to isolate GPU child processes
  • timeout=_SUBPROCESS_TIMEOUT_SEC to prevent hangs

However, run_eval_runner_and_check_no_failures() in this test file uses raw subprocess.run(args, capture_output=True, text=True, check=True)no timeout, no start_new_session=True. If the original CI stalling was caused by subprocess hangs or orphaned GPU processes, these tests could still stall even with the marker separation, since the subprocess itself has no timeout.

Suggestion: Consider either:

  1. Refactoring run_eval_runner_and_check_no_failures() to use run_subprocess() from the utils module, or
  2. At minimum, adding timeout=_SUBPROCESS_TIMEOUT_SEC and start_new_session=True to the subprocess.run() call.

This isn't a blocker if the CI workflow-level timeout-minutes: 60 is sufficient, but it would be more robust to have per-subprocess timeouts too.

PR description

The PR description still has the default template text. A brief note about reverting the skips now that the CI stalling fix is in place would help future readers.

Overall: The change itself is correct and minimal. Approving since the marker infrastructure is in place and CI will validate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants