Fix: allow manual restart while process is WAITING_RESTART#1
Draft
jamesastound wants to merge 1 commit intomasterfrom
Draft
Fix: allow manual restart while process is WAITING_RESTART#1jamesastound wants to merge 1 commit intomasterfrom
jamesastound wants to merge 1 commit intomasterfrom
Conversation
…g stale pid; add test and docker verification
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Title: Fix: allow manual restart while process is WAITING_RESTART (clear stale pid before start)
Summary
This PR fixes an issue where
pm2 restart <id>would fail while a process was inWAITING_RESTART(e.g., due to--restart-delayorexp_backoff_restart_delay). The daemon sometimes retained a staleproc.process.pidin itsclusters_db, causingstartProcessId()to reject a start with "Process with pid already exists" or the CLI to display "Process not found".What I changed
lib/God/ActionMethods.js
God.startProcessId(id, cb)now checks whetherproc.process.pidis actually running usingGod.checkProcess(pid). If it is not alive, it clearsproc.process.pid = 0and proceeds to start. Ifproc.process.pidis alive, it must still reject the start as before.test/programmatic/restart-delay.mocha.js
WAITING_RESTARTby launching a failing script (wrong.js) withrestart_delayand waiting until the process enterswaiting restart, then attemptspm2.restart(<pm_id>)while still waiting.test/programmatic/fixtures/restart-delay/wrong.js
Dockerfile.localpm2
docker_test.sh
waiting restart, then restart the process bypm_id.test/unit.sh
restart-delay.mocha.jsinto the unit test runner.Why this is needed
proc.process.pidthat remains set inclusters_dbwhile a process is inwaiting restartcan incorrectly block manual restarts, preventing users from intervening throughpm2 restart <id>. Historically, PRs Fix the bug: process with "restart-delay" can't restart when manually stop Unitech/pm2#3572 and Fix the bug: process with "restart-delay" can't restart when manually stop Unitech/pm2#3584 partially addressed PID cleanup, but there were still edge cases where a value persisted and caused rejected starts. This fix ensures the start path is defensive against a stale pid.How I tested
test/programmatic/restart-delay.mocha.jswhich polls until the process iswaiting restartthen restarts it by pm_id. The test verifies that the restart request succeeds.Dockerfile.localpm2builds a container with the modified local pm2 and runsdocker_test.shwhich performs the same reproduction and restart non-interactively.Example behavior (Docker run)
wrong.jsintowaiting restartpm2 restartattempted while statuswaiting restartand the command returns success: [PM2] wrongtest ✓Files changed - summary
Backwards compatibility & risk
proc.process.pidis not running, we clear it and proceed; this should not affect healthy process lifecycles.God.checkProcessincorrectly reports a process as dead in rare races; the code conservatively defaults to clearing the pid only whenGod.checkProcessreturns false or throws.CI / Reporting / Notes
Please run the full unit suite and e2e tests to ensure no regressions:
bash test/unit.shbash test/e2e.shYou can reproduce using the included Docker container:
The change is small, and the tests added cover the scenario for restart with
--restart-delay.Related issues & PRs