Skip to content

fix: improve git token resolution and resilience in repo pods#525

Merged
jonwiggins merged 1 commit intojonwiggins:mainfrom
nethi:fix/git-token-resolution
May 4, 2026
Merged

fix: improve git token resolution and resilience in repo pods#525
jonwiggins merged 1 commit intojonwiggins:mainfrom
nethi:fix/git-token-resolution

Conversation

@nethi
Copy link
Copy Markdown
Contributor

@nethi nethi commented Apr 28, 2026

Summary

  • Fixes a critical issue where git clone hangs during repo pod initialization due to AAD (Additional Authenticated Data) decryption failures when a workspace context is missing. This seems to happen after GITHUB tokens started getting stored by default with workspaceId.
  • Adds fallback logic to getServerToken to proactively find any available GITHUB_TOKEN in the database and use its workspace context as a decryption anchor.
  • Enhances git-token-service to correctly propagate workspaceId when resolving tokens.

Changes

  • apps/api/src/services/github-token-service.ts: Implemented a "smarter" lookup that identifies any valid token to anchor decryption for system-level requests (like repo-init.sh).
  • apps/api/src/services/git-token-service.ts: Updated getGitToken to pass through the workspace context.

Test Validation

  • Unit Test Updates:
    • Updated apps/api/src/services/github-token-service.test.ts to include the secrets table in database mocks.
    • Enhanced the DB mock to support chained .limit(1) and async iterator results for Drizzle queries.
    • Added a new test case: "falls back to any available PAT when GitHub App not configured (server context, no workspaceId)" to verify the fix.
    • Fixed linting issues (changed @ts-ignore to @ts-expect-error).
  • Verification: All 2070 unit tests passed successfully.

- add fallback to find any available GITHUB_TOKEN when workspace context is missing
- propagate workspace context to GitHub token lookup in git-token-service
- resolve AAD decryption failures during system-level repo initialization
@jonwiggins jonwiggins merged commit 15d0faa into jonwiggins:main May 4, 2026
7 checks passed
jonwiggins added a commit that referenced this pull request May 4, 2026
POST /api/secrets was threading the caller's workspaceId into storeSecret
regardless of scope, so picking "Global" in the setup UI wrote contradictory
(scope='global', workspace_id='ws-X') rows. AAD-bound retrieval from
workspace-less callers (repo-init, /api/internal/git-credentials, the
workflow trigger worker) then auth-tag-failed on decrypt — observed as
hangs and 500s when adding a repo or starting a session.

- storeSecret now rejects scope='global' with a non-null workspaceId
- routes/secrets.ts strips workspaceId on the global write path (and treats
  omitted scope as global up front so the rule applies uniformly)
- new healContradictoryGlobalSecrets() runs at boot inside a pg_advisory_lock
  to re-encrypt existing bad rows under the canonical global AAD; rows
  shadowed by a true global row are dropped to avoid PG's NULLS-distinct
  UNIQUE behavior creating duplicates
- revert PR #525's "find any GITHUB_TOKEN, borrow its workspace_id" fallback
  in getServerToken — now unnecessary and itself nondeterministic
- audit log + response use effectiveScope (not input.scope) so user→global
  downgrade in auth-disabled mode is reported truthfully
- per-row INFO log on heal so an operator can audit which secrets crossed
  workspace boundaries

The DB-side CHECK constraint is intentionally deferred to a follow-up so it
doesn't 500 in-flight POST /api/secrets calls during a rolling deploy of
this change.
jplorier pushed a commit to jplorier/optio that referenced this pull request May 5, 2026
…gins#525)

- add fallback to find any available GITHUB_TOKEN when workspace context is missing
- propagate workspace context to GitHub token lookup in git-token-service
- resolve AAD decryption failures during system-level repo initialization

Co-authored-by: Ramesh Nethi <r.nethi@gogatewayai.com>
jplorier pushed a commit to jplorier/optio that referenced this pull request May 5, 2026
…nwiggins#509)

POST /api/secrets was threading the caller's workspaceId into storeSecret
regardless of scope, so picking "Global" in the setup UI wrote contradictory
(scope='global', workspace_id='ws-X') rows. AAD-bound retrieval from
workspace-less callers (repo-init, /api/internal/git-credentials, the
workflow trigger worker) then auth-tag-failed on decrypt — observed as
hangs and 500s when adding a repo or starting a session.

- storeSecret now rejects scope='global' with a non-null workspaceId
- routes/secrets.ts strips workspaceId on the global write path (and treats
  omitted scope as global up front so the rule applies uniformly)
- new healContradictoryGlobalSecrets() runs at boot inside a pg_advisory_lock
  to re-encrypt existing bad rows under the canonical global AAD; rows
  shadowed by a true global row are dropped to avoid PG's NULLS-distinct
  UNIQUE behavior creating duplicates
- revert PR jonwiggins#525's "find any GITHUB_TOKEN, borrow its workspace_id" fallback
  in getServerToken — now unnecessary and itself nondeterministic
- audit log + response use effectiveScope (not input.scope) so user→global
  downgrade in auth-disabled mode is reported truthfully
- per-row INFO log on heal so an operator can audit which secrets crossed
  workspace boundaries

The DB-side CHECK constraint is intentionally deferred to a follow-up so it
doesn't 500 in-flight POST /api/secrets calls during a rolling deploy of
this change.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants