Skip to content

Fix runtime install wait race#24

Open
alceops wants to merge 3 commits intoRoonLabs:mainfrom
alceops:alce/fix-runtime-install-race
Open

Fix runtime install wait race#24
alceops wants to merge 3 commits intoRoonLabs:mainfrom
alceops:alce/fix-runtime-install-race

Conversation

@alceops
Copy link
Copy Markdown

@alceops alceops commented Apr 30, 2026

Summary

  • wait for the actual Server/RoonServer launcher and RoonDotnet runtime directory before the fresh-install runtime assertions run
  • keep VERSION as part of the readiness check, but no longer treat it as the sole install-complete sentinel
  • add timeout diagnostics that show which install artifact is still missing

Fixes #23.

Verification

  • bash -n tests/runtime.sh
  • git diff --check

I did not run the full Docker runtime workflow in this worker; the change is limited to the shell readiness predicate used by that workflow.

@gtunes-dev
Copy link
Copy Markdown
Collaborator

I don't think this is the right fix. It essentially bakes the smoke tests into the wait function (adds all the file tests to the wait) instead of maintaining the separation of concern between :

  1. wait for the installation to complete
  2. verify the post installation state

I prefer the cleanliness of my proposed fix - wait for the log signal (which already a function of the smoke harness) that indicates that the installation completed. Make no assumptions about what was installed. Then allow the smoke tests to validate the installation itself.

Just my opinion.

@alceops
Copy link
Copy Markdown
Author

alceops commented Apr 30, 2026

Thanks — agreed on keeping install-complete signaling separate from the artifact assertions. I updated the PR in 77947bd to wait on the entrypoint's final Branch: log signal before the runtime assertions/pre-switch stops, and removed the artifact-specific wait logic.

Verification here: bash -n tests/runtime.sh and git diff --check passed; I still did not run the full Docker runtime flow locally.

@gtunes-dev
Copy link
Copy Markdown
Collaborator

gtunes-dev commented Apr 30, 2026

There is a very simple approach to this, which is what I proposed in the issue I created. It's just to change wait_for_install to this. That's the entire fix. It's not branch specific, it has no opinion about what the installation completed state is. It just simply waits until the installation is done and returns. If the installation is already done, it sees that and returns immediately.

This is up to the maintainers but I think you should consider modifying your PR to do just this. And test it, too, please. I've tested it pretty extensively and it works.

wait_for_install() {
    local dir="$1"
    local timeout="${2:-180}"
    echo "    Waiting for RoonServer install to complete..."
    wait_for_log "$CONTAINER" "RoonServer installed successfully" "$timeout"
}

@alceops
Copy link
Copy Markdown
Author

alceops commented Apr 30, 2026

Updated again in fd2c3a0 to match the simpler issue proposal: wait_for_install now just waits for the existing RoonServer installed successfully log signal, and I restored the call sites rather than moving branch-specific waits into the runtime checks.

Verification now completed locally:

  • docker build -t roon-docker-runtime-test:alce-pr24 .
  • ./tests/runtime.sh roon-docker-runtime-test:alce-pr2441 passed, 0 failed
  • git diff --check
  • bash -n tests/runtime.sh

@gtunes-dev
Copy link
Copy Markdown
Collaborator

Thanks for adapting this @alceops! Very much appreciated!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Race condition in runtime test causes spurious build workflow failure

2 participants