Skip to content

Release CI once again failed to build windows executables #5285

@fingolfin

Description

@fingolfin

... and this time I only noticed after sending the release announcement sigh.

This is essentially a variant of #5011

The failed job: https://github.com/gap-system/gap/actions/runs/3722840754/jobs/6314090227

Immediate cause reported at gap-packages/agt#15

This is really annoying, and hints at multiple problems:

  1. of course I should have noticed the missing windows binaries, I screwed up :-(
  2. but we know that humans (well, me, but I think I am not an exception) tend to miss such things, so really our automation should have noticed -- e.g. the scripts updating the website could have bailed out and refused the output based on the absence of those files
  3. the CI tests for the PackageDistro perhaps could have also discovered this problem, by checking for broken symlinks (or perhaps really any kind of symlinks, as those seem to be a repeated source of issues on Windows)
  4. we lack a good mechanism to "heal" such issues: after all, it is trivial to work around the issue (just delete the offending symlink). But in practice I don't see any good way to achieve this without release 4.12.3: I can't just insert this rm PATH invocation into the workflow. And I also can't just re-tag, as it quite likely would produce new tarballs with differing shasums.

How to resolve this now?

I see these options:

  1. release GAP 4.12.3 with identical source code, just a rm -f pkg/agt/doc/mathjax added (urgh, doesn't seem appealing)
  2. find a way to inject that rm into a re-run of the CI job (I see no way to do that, though I guess if we had tmate integration set up for that job, perhaps that would allow for it...)
  3. I could re-tag 4.12.2 after all, after first downloading all relevant tarballs; then after all CI run, replace any of the new tarballs that changed SHA256
  4. someone (@ChrisJefferson perhaps) could perhaps build the GAP .exe "manually" and upload it to the release

Steps to help avoid this kind of mistake in the future

  • teach the PackageDistro to reject package update with broken symlinks (or perhaps even with any symlinks) -- done in Forbid symlinks in package tarballs PackageDistro#669
  • teach ReleaseTools to reject broken symlinks (or perhaps even any symlinks), see Reject broken symlinks or even all ReleaseTools#95
  • add a step in the dev/releases/README.md that explicitly reminds to check that all files are in the release (listing specifically what to look for and how many files there should be -- or just suggesting to "compare to the previous release) -- see PR Improve dev/releases/README.md #5287
  • teach the website update scripts to check for the presence of all tarballs and refuse to update if they are missing
  • ...

Steps to make recovery from such issues easier in the future

Well, hopefully this just won't happen again, by taking the steps above. But realistically, it will happen, just less often. Less often also means we'll have less experience dealing with these problems, so I think it makes sense to prepare for it.

  • add a section to dev/releases/README.md that discusses options for when something went wrong, from techniques for "healing" certain kind of problems (and warnings for things to watch out for -- e.g. for 4.12.0 I thought I was clever and downloaded a tarball, "fixed" it, then re-uploaded the result -- but I messed up file access right while doing so. Ouch.

Metadata

Metadata

Assignees

No one assigned

    Labels

    os: windowsIssues and PRs that are (at least partially) specific to Windowstopic: ciAnything related to GitHub Actions, Codecov, AppVeyor, Coveralls, Travis, ...

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions