... and this time I only noticed after sending the release announcement sigh.
This is essentially a variant of #5011
The failed job: https://github.com/gap-system/gap/actions/runs/3722840754/jobs/6314090227
Immediate cause reported at gap-packages/agt#15
This is really annoying, and hints at multiple problems:
- of course I should have noticed the missing windows binaries, I screwed up :-(
- but we know that humans (well, me, but I think I am not an exception) tend to miss such things, so really our automation should have noticed -- e.g. the scripts updating the website could have bailed out and refused the output based on the absence of those files
- the CI tests for the PackageDistro perhaps could have also discovered this problem, by checking for broken symlinks (or perhaps really any kind of symlinks, as those seem to be a repeated source of issues on Windows)
- we lack a good mechanism to "heal" such issues: after all, it is trivial to work around the issue (just delete the offending symlink). But in practice I don't see any good way to achieve this without release 4.12.3: I can't just insert this
rm PATH invocation into the workflow. And I also can't just re-tag, as it quite likely would produce new tarballs with differing shasums.
How to resolve this now?
I see these options:
- release GAP 4.12.3 with identical source code, just a
rm -f pkg/agt/doc/mathjax added (urgh, doesn't seem appealing)
- find a way to inject that
rm into a re-run of the CI job (I see no way to do that, though I guess if we had tmate integration set up for that job, perhaps that would allow for it...)
- I could re-tag 4.12.2 after all, after first downloading all relevant tarballs; then after all CI run, replace any of the new tarballs that changed SHA256
- someone (@ChrisJefferson perhaps) could perhaps build the GAP .exe "manually" and upload it to the release
Steps to help avoid this kind of mistake in the future
Steps to make recovery from such issues easier in the future
Well, hopefully this just won't happen again, by taking the steps above. But realistically, it will happen, just less often. Less often also means we'll have less experience dealing with these problems, so I think it makes sense to prepare for it.
... and this time I only noticed after sending the release announcement sigh.
This is essentially a variant of #5011
The failed job: https://github.com/gap-system/gap/actions/runs/3722840754/jobs/6314090227
Immediate cause reported at gap-packages/agt#15
This is really annoying, and hints at multiple problems:
rm PATHinvocation into the workflow. And I also can't just re-tag, as it quite likely would produce new tarballs with differing shasums.How to resolve this now?
I see these options:
rm -f pkg/agt/doc/mathjaxadded (urgh, doesn't seem appealing)rminto a re-run of the CI job (I see no way to do that, though I guess if we hadtmateintegration set up for that job, perhaps that would allow for it...)Steps to help avoid this kind of mistake in the future
dev/releases/README.mdthat explicitly reminds to check that all files are in the release (listing specifically what to look for and how many files there should be -- or just suggesting to "compare to the previous release) -- see PR Improve dev/releases/README.md #5287Steps to make recovery from such issues easier in the future
Well, hopefully this just won't happen again, by taking the steps above. But realistically, it will happen, just less often. Less often also means we'll have less experience dealing with these problems, so I think it makes sense to prepare for it.
dev/releases/README.mdthat discusses options for when something went wrong, from techniques for "healing" certain kind of problems (and warnings for things to watch out for -- e.g. for 4.12.0 I thought I was clever and downloaded a tarball, "fixed" it, then re-uploaded the result -- but I messed up file access right while doing so. Ouch.