Skip to content

fix(gbrain-sync): fold hostname into code-source id hash#1468

Open
0xDevNinja wants to merge 1 commit into
garrytan:mainfrom
0xDevNinja:fix/1414-cross-machine-source-id
Open

fix(gbrain-sync): fold hostname into code-source id hash#1468
0xDevNinja wants to merge 1 commit into
garrytan:mainfrom
0xDevNinja:fix/1414-cross-machine-source-id

Conversation

@0xDevNinja
Copy link
Copy Markdown

@0xDevNinja 0xDevNinja commented May 13, 2026

Summary

  • deriveCodeSourceId now keys its 8-char hash off ${hostname}::${absolute repo path} instead of the path alone.
  • Conductor worktrees on a single host stay distinct (path entropy unchanged within a host). Two machines with the same absolute layout against a federated brain stop colliding.
  • One new test asserts distinct ids across simulated hostnames + stable id within the same host.

Fixes #1414.

Why

pathHash = sha1(repoPath).slice(0,8) was a function of absolute filesystem path only. Two machines with identical home-dir layouts (chezmoi dotfiles, ansible-provisioned VMs, single-user multi-host fleets) produce identical source ids when both run against a shared brain DB. Last-writer-wins on sources.local_path; bare gbrain sync on the loser surfaces a cryptic Not a git repository against the cwd that is a git repo.

The v1.29.0.0 "Conductor worktrees of the same repo coexist as separate sources" promise holds within one host because Conductor worktrees live at different paths. It breaks across hosts because the path is the same on both.

Fix

const host = process.env.GSTACK_HOSTNAME || hostname();
const hostPathHash = createHash("sha1")
  .update(`${host}::${repoPath}`)
  .digest("hex")
  .slice(0, 8);

os.hostname() is the cheapest stable host identifier that works on every platform without a privileged read. /etc/machine-id is stabler across rename but Linux-specific. GSTACK_HOSTNAME is a test-only knob; production leaves it unset.

Migration

Legacy path-only-hashed sources age out naturally. In-place migration would force a brain-wide rewrite for a minority workflow; the existing deriveLegacyCodeSourceId orphan-cleanup pattern can pick them up in a follow-up if a one-shot rewrite is preferred.

Out of scope: the TODOS.md P3 entry about cross-remote-host collisions (github.com/acme/foo vs gitlab.com/acme/foo on the same machine). Different axis.

Tests

test/gstack-gbrain-sync.test.ts case derives distinct source ids for the same absolute path on different hosts:

  • Same temp repo + same remote + same cwd, GSTACK_HOSTNAME=machine-a vs GSTACK_HOSTNAME=machine-b → distinct gbrain-valid ids.
  • Same host + same path across two invocations → identical id.
  • No-op gbrain shim is dropped on PATH so the dry-run code stage runs.
bun test test/gstack-gbrain-sync.test.ts -t "distinct source ids"
# 1 pass, 0 fail

Rest of file matches the pre-existing baseline on upstream/main (10 fail locally because gbrain CLI is not installed in this environment; unchanged by this diff).


View in Codesmith
Need help on this PR? Tag @codesmith with what you need.

  • Let Codesmith autofix CI failures and bot reviews

Pre-fix `deriveCodeSourceId` hashed the absolute repo path alone, so two
machines with identical home-dir layouts (chezmoi-managed dotfiles,
ansible-provisioned VMs) derived the same id and clobbered each other's
`local_path` in a federated brain. Last-writer-wins, with cryptic "Not a
git repository" errors on the loser.

Hash key is now `${hostname}::${path}`. Conductor worktrees on a single
host stay distinct (path entropy unchanged within a host); cross-machine
federations stop colliding. Legacy path-only-hashed sources age out
naturally — in-place migration would force a brain-wide rewrite for a
minority workflow, and the existing `deriveLegacyCodeSourceId` orphan
cleanup pattern can pick them up in a follow-up if needed.

`GSTACK_HOSTNAME` env var is a test-only knob; production uses
`os.hostname()`.

Fixes garrytan#1414
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

/sync-gbrain: cross-machine source-id collision when two machines use the same absolute repo path

1 participant