Skip to content

refactor(ci): replace Mergify with GitHub-native auto-merge + auto-update#401

Merged
github-actions[bot] merged 2 commits into
mainfrom
refactor/ditch-mergify-for-gha-auto-merge
May 23, 2026
Merged

refactor(ci): replace Mergify with GitHub-native auto-merge + auto-update#401
github-actions[bot] merged 2 commits into
mainfrom
refactor/ditch-mergify-for-gha-auto-merge

Conversation

@coseto6125
Copy link
Copy Markdown
Owner

Why

Today's incident (PRs #386 #387 #388 stuck for hours at `Mergify Merge Queue: neutral / Merge queue is ready` despite all conditions satisfied; required 4 follow-up PRs #391/#393/#398 to even partially diagnose) demonstrated that Mergify's overhead exceeds its value at this repo's scale (single-author, low PR volume).

What this PR does

Removes

  • `.mergify.yml` — full deletion

Adds

  • `.github/workflows/auto-merge.yml` — on every non-draft PR event, enables GitHub native auto-merge (`gh pr merge --auto --squash --delete-branch`). GitHub waits for required checks + up-to-date branch, then merges.
  • `.github/workflows/auto-update-pr-branches.yml` — on push to main (+ 15-min cron safety net), iterates auto-merge-armed PRs and calls `/update-branch` API. Conflicts → one-time marker PR comment + job failure (GitHub emails author).

Fixes 2 unrelated hardcode issues in release.yml

  • `owner: coseto6125` → `${{ github.repository_owner }}`
  • `coseto6125/homebrew-tap` → `env.HOMEBREW_TAP_REPO` (single source of truth for cross-step reuse)

What we keep

  • `ecp-pr-analyze.yml` workflow — area/risk labels + cross-pr-conflict commit status remain valuable independent signals
  • All required-check branch protection rules
  • ecp:area-* and ecp:risk-* label classification by ecp

What we lose (and why it's acceptable)

Mergify feature Substitute Why OK at this scale
Speculative trial (PR diff + main tip CI) `Require up to date` + auto-update-pr-branches rebase + re-CI catches ~90% of post-merge issues
Batched trial (N PRs in one CI run) none batch_size was aspirational; zero batches today
Per-area parallel queue UI label-based mental model single author rarely has multi-PR parallelism

Post-merge actions

  1. Manual: uninstall Mergify GitHub App from this repo (Settings → Integrations → Mergify → Configure → uninstall). Until done, `Mergify Merge Queue` neutral check still appears but is harmless (not in required checks).
  2. Close test PRs chore(test): Mergify routing — docs area #386/chore(test): Mergify routing — cli area #387/chore(test): Mergify routing — parser area #388 — Mergify routing dogfood targets, no longer needed.

Test plan

  • Self-PR: workflow YAML passes actionlint (verified locally)
  • After merge + Mergify app uninstall: next PR's lifecycle uses pure GitHub auto-merge
  • Push to main triggers auto-update-pr-branches without error

…date

Delete .mergify.yml + Mergify GitHub App dependency. Replace with two
small GitHub Actions workflows that together cover Mergify's value
for this repo's actual scale (single-author, low PR volume):

1. .github/workflows/auto-merge.yml
   On non-draft PR events, enable GitHub native auto-merge via
   `gh pr merge --auto --squash --delete-branch`. GitHub then waits
   for required checks + up-to-date branch, merges automatically.

2. .github/workflows/auto-update-pr-branches.yml
   On push to main (and 15-min cron safety net), iterate eligible
   open PRs and call /update-branch API — programmatic equivalent of
   the 'Update branch' button. Conflicts → one-time marker comment
   + job failure so GitHub emails the author.

What we lose vs Mergify:
- Speculative trial (PR diff + main tip CI before merge) — substituted
  by 'Require branches to be up to date' + auto-update.
- Batched trial (multiple PRs in one CI run) — never exercised on this
  repo; batch_size config was aspirational.
- Per-area parallel queue UI categorization — ecp:area-* labels (set
  by ecp-pr-analyze.yml) survive and remain readable for humans / LLM
  agents.

What we gain:
- No more 'Mergify Merge Queue: neutral / Merge queue is ready' noise
  blocking dogfood PRs (see today's stuck #386 #387 #388).
- One less third-party service in the merge path.
- Workflows readable end-to-end in the repo, no Mergify dashboard hop.

Also bundle 2 small hardcode fixes in release.yml:
- owner: coseto6125 → owner: ${{ github.repository_owner }}
- coseto6125/homebrew-tap → env.HOMEBREW_TAP_REPO (single source of truth)
@github-actions github-actions Bot enabled auto-merge (squash) May 23, 2026 16:45
actionlint surfaced 3 shellcheck warnings on auto-update-pr-branches.yml:

- SC2221: `*"already up"*` always overrides `*"up to date"*` (the
  former matches "already up to date" too) → drop the redundant pattern.
- SC2222: `*conflict*` always wins over `*"Merge conflict"*` (subset) →
  drop the redundant pattern.
- SC2016: backticks inside the printf single-quoted format string look
  like un-expanded command substitution, but they are intentional
  Markdown formatting (`main`, `%s`). False positive → disable with
  shellcheck directive + comment explaining why.
@github-actions
Copy link
Copy Markdown
Contributor

ecp impact cache (0 symbols) — internal, used by ecp dev pr-analyze

[]

@github-actions github-actions Bot added the ecp:risk-low ecp signal label May 23, 2026
@github-actions github-actions Bot merged commit 29e8b3b into main May 23, 2026
20 checks passed
github-actions Bot added a commit that referenced this pull request May 23, 2026
…state (#402)

* diagnostic(ci): instrument all 5 checkout sites to capture flake state

The 'could not read Username for https://github.com' / 'Bad credentials'
failures across today's runs (#388 macos, #393 main-push ubuntu, #401
ubuntu test, etc.) all happen inside actions/checkout@v6.0.2's
'Fetching the repository' step right after a successful 'Setting up
auth' that writes an includeIf-scoped credentials config. Failure looks
consistent with includeIf gitdir path resolution mismatch, but we have
no direct evidence yet — the hypothesis ought to be confirmed before we
swap to a workaround that downgrades security (persist-credentials:
true) or replaces actions/checkout outright.

This PR adds .github/actions/diagnose-checkout-failure (composite
action) and wires every actions/checkout call site in ci.yml to:

  1. continue-on-error: true on the checkout step (id: checkout)
  2. follow-up step that runs only on steps.checkout.conclusion ==
     'failure' and dumps:
     - git version
     - workspace path / readlink chain (catches symlink mismatches)
     - .git/config contents (raw)
     - resolved gitdir from git rev-parse --absolute-git-dir
     - effective config after includeIf evaluation
     - existence of every includeIf-referenced path
     - leftover credentials files in RUNNER_TEMP / /tmp / /github/runner_temp
     - per-component readlink of GITHUB_WORKSPACE (symlink detection)
  3. Re-fails the job at the end so the diagnostic doesn't silently
     convert a real failure into a pass.

Once the next flake fires, the dump will tell us which of:
  - includeIf path doesn't match actual gitdir
  - credentials file got cleaned up between setup and fetch
  - workspace path has a symlink we didn't anticipate
  - the runner had no token in the first place
…actually caused the failure, and we can apply the proportionate fix.

This PR is intentionally diagnostic-only: no behavior change beyond
the failure-path noise. Once root cause is identified and fixed,
revert by removing the diagnostic step / continue-on-error from each
checkout block.

* ci: re-trigger checks for merge commit 53e93a7

The 'Update branch' merge commit was authored by the GitHub UI using
GITHUB_TOKEN, which by GH Actions design does not trigger downstream
pull_request workflows on the resulting SHA. Only the Mergify check
(lazy-evaluated) registered against 53e93a7; CI / CodeQL / ecp PR analyze
all targeted the prior SHA (8ae2fc0).

This empty commit emits a user-authored push so pull_request:synchronize
fires and the full check matrix runs against the actual PR head.

---------

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
coseto6125 added a commit that referenced this pull request May 23, 2026
… ref

Root cause for today's recurring 'could not read Username for https://github.com'
flakes (#388 macos, #393 ubuntu, #395 dep-review, #397 main-push,
#401 ubuntu Test, #402 instrumentation): the runner image ships with a
default credential.helper in /etc/gitconfig that errors with ENXIO when
git falls back to it. actions/checkout sets up http.extraheader scoped
to the repo URL, but on certain runner image revisions the auth setup
escapes our isolation and the system helper gets invoked anyway.

Rather than work around the broken helper (which would leave a
permanent shellcheck-style `-c credential.helper=` debt on every
git command), we eliminate the failure surface entirely: the only steps
that do post-checkout fetches all want the same thing — main's tip
SHA — and GitHub already provides that in the pull_request event
payload (`github.event.pull_request.base.sha`).

# ci.yml — `Detect code changes` job

Was:
  git fetch --no-tags origin "$BASE_REF"
  diff_range="origin/$BASE_REF...HEAD"

Now:
  # Event payload exposes base.sha for free; checkout used default
  # ref (refs/pull/N/merge) so both sides are in local object DB.
  diff_range="$BASE_SHA...HEAD"

Three-dot range still gives merge-base..HEAD semantics — equivalent to
the old behavior, no network needed.

# ecp-pr-analyze.yml — drop `Fetch base ref` + recompute branch point locally

Was:
  - uses: actions/checkout@v6.0.2
    with:
      ref: ${{ pull_request.head.sha }}     # only PR head ancestors fetched
  - name: Fetch base ref
    run: git fetch origin "$BASE_REF:..."  # network — triggers ENXIO flake
  ...
  BASE=$(git merge-base "origin/$BASE_REF" HEAD)

Now:
  - uses: actions/checkout@v6.0.2
    with:
      fetch-depth: 0
      # No `ref:` override. Default refs/pull/N/merge brings both PR
      # head AND base history into local object DB.
  - name: Compute branch point + switch HEAD to PR head
    run: |
      PR_HEAD=$(git rev-parse HEAD^1)       # merge ref's parent 1
      BASE_TIP=$(git rev-parse HEAD^2)      # merge ref's parent 2
      BRANCH_POINT=$(git merge-base "$PR_HEAD" "$BASE_TIP")
      git checkout "$PR_HEAD"

Branch point is the SAME value the old `git merge-base origin/<base>`
would produce — but derived purely from local objects (the merge ref's
two parents) instead of a network fetch.

# Edge cases

- PR with merge conflicts: GitHub doesn't compute refs/pull/N/merge,
  checkout fails. This is correct — conflicted PRs can't merge, so
  ecp impact analysis would be meaningless. Author resolves conflict,
  ref recomputed, next run works.
- Push to main / merge_group / workflow_dispatch: unchanged code path
  (already used BEFORE_SHA / blanket 'code=true', no fetch).

# Result

- One entire class of CI flake eliminated: no post-checkout git fetch
  means no credential-helper invocation means no ENXIO.
- No upstream-bug workaround comment debt.
- Slightly faster CI (one fewer network round-trip per PR job).
- Closes the path that diagnostic instrumentation in PR #402 was
  trying to capture; PR #402 can be closed once this lands.
coseto6125 added a commit that referenced this pull request May 23, 2026
… ref (#404)

Root cause for today's recurring 'could not read Username for https://github.com'
flakes (#388 macos, #393 ubuntu, #395 dep-review, #397 main-push,
#401 ubuntu Test, #402 instrumentation): the runner image ships with a
default credential.helper in /etc/gitconfig that errors with ENXIO when
git falls back to it. actions/checkout sets up http.extraheader scoped
to the repo URL, but on certain runner image revisions the auth setup
escapes our isolation and the system helper gets invoked anyway.

Rather than work around the broken helper (which would leave a
permanent shellcheck-style `-c credential.helper=` debt on every
git command), we eliminate the failure surface entirely: the only steps
that do post-checkout fetches all want the same thing — main's tip
SHA — and GitHub already provides that in the pull_request event
payload (`github.event.pull_request.base.sha`).

# ci.yml — `Detect code changes` job

Was:
  git fetch --no-tags origin "$BASE_REF"
  diff_range="origin/$BASE_REF...HEAD"

Now:
  # Event payload exposes base.sha for free; checkout used default
  # ref (refs/pull/N/merge) so both sides are in local object DB.
  diff_range="$BASE_SHA...HEAD"

Three-dot range still gives merge-base..HEAD semantics — equivalent to
the old behavior, no network needed.

# ecp-pr-analyze.yml — drop `Fetch base ref` + recompute branch point locally

Was:
  - uses: actions/checkout@v6.0.2
    with:
      ref: ${{ pull_request.head.sha }}     # only PR head ancestors fetched
  - name: Fetch base ref
    run: git fetch origin "$BASE_REF:..."  # network — triggers ENXIO flake
  ...
  BASE=$(git merge-base "origin/$BASE_REF" HEAD)

Now:
  - uses: actions/checkout@v6.0.2
    with:
      fetch-depth: 0
      # No `ref:` override. Default refs/pull/N/merge brings both PR
      # head AND base history into local object DB.
  - name: Compute branch point + switch HEAD to PR head
    run: |
      PR_HEAD=$(git rev-parse HEAD^1)       # merge ref's parent 1
      BASE_TIP=$(git rev-parse HEAD^2)      # merge ref's parent 2
      BRANCH_POINT=$(git merge-base "$PR_HEAD" "$BASE_TIP")
      git checkout "$PR_HEAD"

Branch point is the SAME value the old `git merge-base origin/<base>`
would produce — but derived purely from local objects (the merge ref's
two parents) instead of a network fetch.

# Edge cases

- PR with merge conflicts: GitHub doesn't compute refs/pull/N/merge,
  checkout fails. This is correct — conflicted PRs can't merge, so
  ecp impact analysis would be meaningless. Author resolves conflict,
  ref recomputed, next run works.
- Push to main / merge_group / workflow_dispatch: unchanged code path
  (already used BEFORE_SHA / blanket 'code=true', no fetch).

# Result

- One entire class of CI flake eliminated: no post-checkout git fetch
  means no credential-helper invocation means no ENXIO.
- No upstream-bug workaround comment debt.
- Slightly faster CI (one fewer network round-trip per PR job).
- Closes the path that diagnostic instrumentation in PR #402 was
  trying to capture; PR #402 can be closed once this lands.
coseto6125 added a commit that referenced this pull request May 23, 2026
… redundant (#429)

Branch protection's `strict: true` was the only thing requiring PR branches
to be up-to-date with main before merging. With strict=false (just landed)
and ecp/cross-pr-conflict as a required check (gates against semantic
overlap with sibling PRs labeled `merge-queue`), the rebase-on-every-main-
push cycle is no longer needed:

  - PRs without symbol overlap with siblings → merge in parallel as their
    own CI goes green; no need to be ahead of main
  - PRs with semantic overlap → ecp/cross-pr-conflict stays Pending →
    GitHub native auto-merge waits

The workflow burned ~10 minutes of CI per PR per main-push (rebase
triggers full Test on 3 OS re-run). For N PRs that's O(N²) CI cost
solving a problem ecp/cross-pr-conflict solves at PR-analyze time
(one job per PR, fixed cost).

Rationale per ecp-pr-analyze.yml's own design comment:
  > ecp/cross-pr-conflict status is anchored to the PR HEAD commit and
  > computed against merge-base, so a base-branch update doesn't
  > invalidate it.

That assumption explicitly precludes the need for rebase-on-base-move.

If a real cross-PR conflict slips through (missing-cache hole in
detect_cross_pr_conflicts after Mergify's speculative-trial backstop
was removed in #401), the final required-checks wall before release
catches it.
coseto6125 added a commit that referenced this pull request May 24, 2026
Bump workspace package + all 4 crates + intra-workspace path-dep pins
+ Cargo.lock from 0.3.0 to 0.4.0 in preparation for the v0.4.0 tag.

Versions touched:
- workspace.package.version (Cargo.toml)
- ecp-core, ecp-analyzer, ecp-cli per-crate versions
- ecp-mcp inherits via workspace.version
- intra-workspace path-deps: ecp-analyzer -> ecp-core,
  ecp-cli -> {ecp-core, ecp-analyzer, ecp-mcp}

Scope of v0.4.0 (covered in the release notes when the tag is cut):
49 commits between v0.3.0 (PR #330 close-out) and this PR — cypher
perf wins (Accumulator + predicate pushdown + kind-CSR + walk_rel
closure + Binding VarMap), MCP host distribution paths (#395),
GitHub-native auto-merge replaces Mergify (#401), `ecp insight`
telemetry, `ecp dev pr-analyze`, schema-bindings rollouts,
benchmark realign with hot-path query coverage, and the RTK-style
skill rewrite landed in PR #419.
coseto6125 added a commit that referenced this pull request May 24, 2026
Bump workspace package + all 4 crates + intra-workspace path-dep pins
+ Cargo.lock from 0.3.0 to 0.4.0 in preparation for the v0.4.0 tag.

Versions touched:
- workspace.package.version (Cargo.toml)
- ecp-core, ecp-analyzer, ecp-cli per-crate versions
- ecp-mcp inherits via workspace.version
- intra-workspace path-deps: ecp-analyzer -> ecp-core,
  ecp-cli -> {ecp-core, ecp-analyzer, ecp-mcp}

Scope of v0.4.0 (covered in the release notes when the tag is cut):
49 commits between v0.3.0 (PR #330 close-out) and this PR — cypher
perf wins (Accumulator + predicate pushdown + kind-CSR + walk_rel
closure + Binding VarMap), MCP host distribution paths (#395),
GitHub-native auto-merge replaces Mergify (#401), `ecp insight`
telemetry, `ecp dev pr-analyze`, schema-bindings rollouts,
benchmark realign with hot-path query coverage, and the RTK-style
skill rewrite landed in PR #419.
@coseto6125 coseto6125 deleted the refactor/ditch-mergify-for-gha-auto-merge branch May 24, 2026 22:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ecp:risk-low ecp signal

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant