Skip to content

refactor(cli): consolidate JFR capture loop into JfrCaptureSession#170

Merged
rlaope merged 1 commit into
masterfrom
refactor/jfr-capture-session
May 9, 2026
Merged

refactor(cli): consolidate JFR capture loop into JfrCaptureSession#170
rlaope merged 1 commit into
masterfrom
refactor/jfr-capture-session

Conversation

@rlaope
Copy link
Copy Markdown
Owner

@rlaope rlaope commented May 9, 2026

Summary

Four CLI commands β€” gcwhy, gcscore, gcprofile, zgc β€” each carried their own ~50-line copy of the same JFR capture lifecycle: tempfile β†’ JFR.start β†’ sleep β†’ JFR.dump β†’ JFR.stop β†’ file-empty check β†’ finally delete tempfile. Subtle drift had crept in:

  • gcprofile passed duration= to JFR.start, which races the explicit dump and can yield an empty file. gcwhy already documented this pitfall but the others didn't avoid it.
  • zgc had a unique pre-stop + retry-on-start-fail wrapper.

This PR moves the lifecycle behind JfrCaptureSession, returning an AutoCloseable Capture so try-with-resources handles tempfile cleanup. Failures surface as the new checked JfrCaptureFailed; callers translate that into their own exit-code convention (gcwhy/gcscore/gcprofile keep return-on-fail, zgc keeps CommandExitException(2)).

  • New: argus-cli/.../jfr/JfrCaptureSession.java (~111 LOC) and JfrCaptureFailed.java (checked exception)
  • 4 commands migrated; net βˆ’47 lines while adding test coverage
  • Tests: JfrCaptureSessionTest covers the unreachable-pid failure path (no leftover temp files) and the Capture#close deletion contract

Side effects

  • gcprofile no longer passes duration= to JFR.start. This fixes a latent race where the JVM auto-stops the recording before the explicit dump.
  • zgc drops its retry-once-on-start-fail loop. The helper does an unconditional pre-stop of any leftover recording with the same name, which is the case the retry was guarding against.

Test plan

  • :argus-cli:test passes (incl. new JfrCaptureSessionTest)
  • Full ./gradlew build -x integrationTest passes
  • Smoke against a live JVM: argus gcwhy <pid> --duration=10 --format=json produces non-empty JSON
  • Smoke: argus zgc <pid> --duration=10 for a ZGC JVM still produces a verdict
  • Smoke: argus gcprofile <pid> --duration=10 produces an allocation profile

Four CLI commands (gcwhy, gcscore, gcprofile, zgc) each carried their own
~50-line copy of the same JFR capture lifecycle: tempfile β†’ JFR.start β†’
sleep β†’ JFR.dump β†’ JFR.stop β†’ file-empty check β†’ finally delete tempfile.
Subtle drift had crept in: gcprofile passed `duration=` to JFR.start,
which races the explicit dump and can yield an empty file (gcwhy already
documented this pitfall but the others didn't avoid it). zgc had a
unique pre-stop + retry-on-start-fail wrapper.

Move the lifecycle behind JfrCaptureSession, returning an AutoCloseable
Capture so try-with-resources handles tempfile cleanup. Failures surface
as the new checked JfrCaptureFailed; callers translate that into their
own exit-code (gcwhy/gcscore/gcprofile keep return-on-fail, zgc keeps
exit code 2).

Side effects:
- gcprofile no longer passes duration= to JFR.start; matches gcwhy/gcscore.
- zgc drops its retry-once-on-start-fail loop; the helper does an
  unconditional pre-stop, which addresses the same leftover-recording
  case the retry was guarding against.

Net change in command files: -269 lines.

Signed-off-by: rlaope <piyrw9754@gmail.com>
@rlaope rlaope merged commit 24a4ef4 into master May 9, 2026
11 checks passed
@rlaope rlaope deleted the refactor/jfr-capture-session branch May 9, 2026 03:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant