Skip to content

Fix: allow manual restart while process is WAITING_RESTART#1

Draft
jamesastound wants to merge 1 commit intomasterfrom
fix/restart-delay-allow-restart-while-waiting
Draft

Fix: allow manual restart while process is WAITING_RESTART#1
jamesastound wants to merge 1 commit intomasterfrom
fix/restart-delay-allow-restart-while-waiting

Conversation

@jamesastound
Copy link
Copy Markdown
Owner

Title: Fix: allow manual restart while process is WAITING_RESTART (clear stale pid before start)

Summary

This PR fixes an issue where pm2 restart <id> would fail while a process was in WAITING_RESTART (e.g., due to --restart-delay or exp_backoff_restart_delay). The daemon sometimes retained a stale proc.process.pid in its clusters_db, causing startProcessId() to reject a start with "Process with pid already exists" or the CLI to display "Process not found".

What I changed

  • lib/God/ActionMethods.js

    • God.startProcessId(id, cb) now checks whether proc.process.pid is actually running using God.checkProcess(pid). If it is not alive, it clears proc.process.pid = 0 and proceeds to start. If proc.process.pid is alive, it must still reject the start as before.
  • test/programmatic/restart-delay.mocha.js

    • Add a programmatic test that reproduces WAITING_RESTART by launching a failing script (wrong.js) with restart_delay and waiting until the process enters waiting restart, then attempts pm2.restart(<pm_id>) while still waiting.
  • test/programmatic/fixtures/restart-delay/wrong.js

    • Minimal reproduction file which throws immediately.
  • Dockerfile.localpm2

    • A local Dockerfile for building an image with the modified pm2 (for maintainers/devs to reproduce the behavior and test).
  • docker_test.sh

    • A script to run in the Docker image to validate the behavior non-interactively: start the failing script with restart_delay, wait for waiting restart, then restart the process by pm_id.
  • test/unit.sh

    • Insert the restart-delay.mocha.js into the unit test runner.

Why this is needed

How I tested

  • Unit test: Added a new test under test/programmatic/restart-delay.mocha.js which polls until the process is waiting restart then restarts it by pm_id. The test verifies that the restart request succeeds.
  • Docker integration (manual): The included Dockerfile.localpm2 builds a container with the modified local pm2 and runs docker_test.sh which performs the same reproduction and restart non-interactively.

Example behavior (Docker run)

  • The docker run logs show:
    • PM2 daemon spawn
    • Start & crash of wrong.js into waiting restart
    • pm2 restart attempted while status waiting restart and the command returns success: [PM2] wrongtest

Files changed - summary

  • lib/God/ActionMethods.js (modified: add defensive pid check + clear stale pid if not alive)
  • test/programmatic/restart-delay.mocha.js (new)
  • test/programmatic/fixtures/restart-delay/wrong.js (new)
  • test/unit.sh (updated to include new test)
  • Dockerfile.localpm2 (new helper for reproducing changes in Docker)
  • docker_test.sh (new helper to run non-interactive docker verification)

Backwards compatibility & risk

  • This is a defensive runtime check—if proc.process.pid is not running, we clear it and proceed; this should not affect healthy process lifecycles.
  • The main risk is if God.checkProcess incorrectly reports a process as dead in rare races; the code conservatively defaults to clearing the pid only when God.checkProcess returns false or throws.

CI / Reporting / Notes

  • Please run the full unit suite and e2e tests to ensure no regressions:

    • bash test/unit.sh
    • bash test/e2e.sh
  • You can reproduce using the included Docker container:

    # Build
    docker build -t pm2-local-mod:latest -f Dockerfile.localpm2 .
    
    # Run the integrated test
    docker run --rm -it pm2-local-mod:latest /bin/sh -lc "/usr/src/app/docker_test.sh"
  • The change is small, and the tests added cover the scenario for restart with --restart-delay.

Related issues & PRs

…g stale pid; add test and docker verification
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant