Skip to content

fix(accounts): reuse existing row on reauth for same ChatGPT identity (closes #788)#799

Closed
ozpool wants to merge 9 commits into
Soju06:mainfrom
ozpool:fix/reauth-merge-by-chatgpt-identity
Closed

fix(accounts): reuse existing row on reauth for same ChatGPT identity (closes #788)#799
ozpool wants to merge 9 commits into
Soju06:mainfrom
ozpool:fix/reauth-merge-by-chatgpt-identity

Conversation

@ozpool
Copy link
Copy Markdown
Contributor

@ozpool ozpool commented May 25, 2026

Why

OAuth reauth currently creates a second local account row with an __copyN suffix whenever the existing row's deterministic id collides under merge_by_email=False (issue #788). The original row — often deactivated because its refresh token was revoked — is left orphaned while the new row takes the active state. The dashboard then shows two rows for one upstream ChatGPT identity, and history-keyed tables become hard to reconcile.

What

Add an identity-aware path on AccountsRepository.upsert:

  • merge_by_chatgpt_identity=True (set by the OAuth reauth caller) finds the canonical row for the given chatgpt_account_id — preferring the oldest by created_at so long-term history stays attached — and applies the fresh tokens, plan, status, and deactivation-reason updates to that row.
  • If duplicate rows already exist for the same upstream identity, reauth repoints dependent rows to the canonical account before deleting the duplicates: usage history, additional usage history, request logs, sticky sessions, HTTP bridge sessions, API key assignments, and limit-warmup state. Duplicate API key assignments and warmup window/reset collisions are de-duplicated instead of violating unique constraints.
  • The path is intentionally independent of the dashboard importWithoutOverwrite setting. That setting governs the side-by-side import flow (two local rows for the same email when importing twice); reauth always reconciles to one local row per upstream identity.
  • Falls through to the existing deterministic-id behavior when the incoming row has no chatgpt_account_id, so legacy local rows that pre-date the field keep working unchanged.
  • Holds a chatgpt-id-scoped Postgres advisory lock on reauth so concurrent reauths for the same upstream identity serialize.

OAuthService._persist_tokens now passes merge_by_chatgpt_identity=True for both the direct-session and factory-session code paths.

Tests

  • tests/integration/test_repositories.py covers deactivated-row reuse, oldest-row preference when stray __copy rows already exist, dependent-row reconciliation, API key assignment de-dupe, warmup collision de-dupe, no-op create when no canonical match is found, and the no-upstream-id fall-through.
  • tests/integration/test_oauth_flow.py pins the OAuth reauth contract: two reauths for the same upstream identity now land on one row, with the second reauth's plan applied in place. The duplicate-email conflict test was reworked so its precondition (two distinct local rows under one email) no longer relies on OAuth side-by-side, by using two distinct upstream ids.
  • openspec/changes/merge-reauth-by-chatgpt-identity/ documents the reauth identity merge and duplicate-row reconciliation contract.

Focused validation after the latest repair:

uv run pytest -q tests/integration/test_repositories.py tests/integration/test_oauth_flow.py tests/unit/test_accounts_repository_locks.py
uv run ruff check app/modules/accounts/repository.py app/modules/oauth/service.py tests/integration/test_repositories.py tests/integration/test_oauth_flow.py tests/unit/test_accounts_repository_locks.py openspec/changes/merge-reauth-by-chatgpt-identity/proposal.md openspec/changes/merge-reauth-by-chatgpt-identity/tasks.md
uv run ty check app/modules/accounts/repository.py app/modules/oauth/service.py tests/integration/test_repositories.py tests/integration/test_oauth_flow.py tests/unit/test_accounts_repository_locks.py

Compatibility note for the renamed test

test_device_oauth_flow_keeps_separate_accounts_when_import_without_overwrite_enabled was pinning behavior that this issue calls out as the bug, so it has been renamed and its assertions inverted rather than kept alongside the new test. The importWithoutOverwrite setting still applies fully to the dashboard import path — only the OAuth reauth code path now reconciles by upstream identity.

Closes #788

ozpool and others added 2 commits May 25, 2026 16:40
OAuth reauth previously created a second local account row with an
`__copyN` suffix when the existing row's deterministic id collided
under merge_by_email=False (issue Soju06#788). The original row, often
deactivated because its refresh token was revoked, was left orphaned
while the new row took the active state — dashboard then showed two
rows for one upstream identity and history-keyed tables (usage,
request logs, sticky sessions) became hard to reconcile.

Add an identity-aware path on `AccountsRepository.upsert`:

- `merge_by_chatgpt_identity=True` (set by the OAuth reauth caller)
  finds the canonical row for the given `chatgpt_account_id` —
  preferring the oldest by `created_at` so long-term history stays
  attached — and applies the fresh tokens, plan, status, and
  deactivation-reason updates to that row.
- The path is intentionally independent of the dashboard
  `importWithoutOverwrite` setting. That setting governs the
  side-by-side import flow (two local rows for the same email when
  importing twice); reauth always reconciles to one local row per
  upstream identity.
- Falls through to the existing deterministic-id behavior when the
  incoming row has no `chatgpt_account_id`, so legacy local rows
  that pre-date the field continue to work unchanged.
- Holds a chatgpt-id-scoped Postgres advisory lock on reauth so
  concurrent reauths for the same upstream identity serialize.

Wire `OAuthService._persist_tokens` to pass
`merge_by_chatgpt_identity=True` for both the direct-session and
factory-session code paths.

Tests:

- New repository tests cover deactivated-row reuse, oldest-row
  preference when stray `__copy` rows already exist, no-op create
  when no canonical match is found, and the no-upstream-id
  fall-through.
- The existing OAuth flow test that exercised the side-by-side
  outcome was renamed and inverted to pin the new contract: two
  reauths for the same upstream identity now land on one row, with
  the second reauth's plan applied in place.
- The duplicate-email conflict test was reworked so its precondition
  (two distinct local rows under one email) no longer relies on
  OAuth side-by-side, by using two distinct upstream ids.

Scope notes:

- This PR prevents new `__copyN` rows from forming on reauth. It
  intentionally does not migrate pre-existing duplicate rows in a
  live database — that needs a separate reconcile script that
  repoints dependent tables (`usage_history`,
  `additional_usage_history`, `request_logs`, `sticky_sessions`,
  `api_key_account_assignments`, `http_bridge_session_records`,
  `account_limit_warmups`) and de-dupes by chatgpt_account_id.
  Happy to follow up with that as a separate change.

Closes Soju06#788

Signed-off-by: ozpool <jittendersingh389@gmail.com>
@Soju06
Copy link
Copy Markdown
Owner

Soju06 commented May 26, 2026

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

if merge_by_email:
if dialect_name == "sqlite" and not sqlite_lock_acquired:
await self._acquire_sqlite_merge_lock()
elif dialect_name == "postgresql":
await self._acquire_postgresql_merge_lock(account.email)
else:

P2 Badge Serialize identity-merge reauths by ChatGPT ID

When merge_by_chatgpt_identity=True, the code still takes an email-scoped advisory lock whenever merge_by_email is enabled (the default path used by OAuth reauth). That means two concurrent reauths for the same upstream chatgpt_account_id but different email claims can run without mutual exclusion, both miss the canonical-row lookup, and both insert, recreating duplicate rows for one identity. Because identity reconciliation now runs regardless of merge_by_email, the lock key also needs to be keyed by ChatGPT identity in this branch.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@Komzpa Komzpa added the 🤖 codex: ok [@codex review] says no issues found. label May 26, 2026
@Soju06 Soju06 added the hermes: needs-followup Hermes left a blocker/comment that needs follow-up observation label May 27, 2026
Addresses the codex P2 review on Soju06#799.

When merge_by_email=True AND merge_by_chatgpt_identity=True with a
chatgpt_account_id set, the postgres advisory lock was previously only
keyed on account.email. Two concurrent reauths for the same upstream
chatgpt_account_id but different email claims would take different
locks, both miss the _account_by_chatgpt_identity lookup, and both
INSERT — recreating the duplicate-row pattern issue Soju06#788 is meant to
prevent.

Take the identity-keyed advisory lock whenever identity reconciliation
is in play, regardless of merge_by_email. When merge_by_email is also
on, the email-scoped lock is still acquired for the email-merge branch
on the same row. Ordering identity-first, email-second gives a stable
acquisition order across callers so two reauths that overlap on either
key serialize without deadlock.

Coverage: tests/unit/test_accounts_repository_locks.py monkeypatches
the postgres dialect and the two lock methods, then pins the lock
sequence across four scenarios:

- merge_by_email=True + merge_by_chatgpt_identity=True ->
  identity lock fires, email lock still fires
- merge_by_email=False + merge_by_chatgpt_identity=True ->
  identity-only lock (unchanged)
- merge_by_email=False + merge_by_chatgpt_identity=False ->
  per-id fallback lock (unchanged)
- merge_by_email=True + merge_by_chatgpt_identity=False ->
  email-only lock (unchanged)
@ozpool
Copy link
Copy Markdown
Contributor Author

ozpool commented May 27, 2026

Addressed the codex P2 in 33ab6cb.

When merge_by_email=True AND merge_by_chatgpt_identity=True with a chatgpt_account_id set, the postgres advisory lock was previously only keyed on account.email. Two concurrent reauths for the same upstream identity but different email claims would take different locks, both miss _account_by_chatgpt_identity, and both insert — recreating the duplicate-row pattern this PR is meant to prevent.

Fix takes the identity-keyed lock whenever identity reconciliation is in play, regardless of merge_by_email. Ordering identity-first, email-second across all callers gives a stable acquisition order so two reauths that overlap on either key serialize without deadlock.

Coverage in tests/unit/test_accounts_repository_locks.py monkeypatches the postgres dialect + the two lock methods and pins the lock sequence across four scenarios:

merge_by_email merge_by_chatgpt_identity identity lock email lock
True True chatgpt:<id> <email>
False True chatgpt:<id>
False False <account.id> (fallback)
True False <email>

uv run pytest tests/integration/test_repositories.py tests/integration/test_accounts_repository.py tests/unit/test_accounts_repository_locks.py clean (16 passed). uv run ruff check + ruff format clean.

@codex review

@Soju06 Soju06 removed the 🤖 codex: ok [@codex review] says no issues found. label May 27, 2026
@chatgpt-codex-connector
Copy link
Copy Markdown

To use Codex here, create a Codex account and connect to github.

@Komzpa Komzpa removed the hermes: needs-followup Hermes left a blocker/comment that needs follow-up observation label May 27, 2026
@Komzpa
Copy link
Copy Markdown
Collaborator

Komzpa commented May 31, 2026

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 33ab6cb00a

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread app/modules/oauth/service.py
@Komzpa Komzpa added the 🤖 codex: needs work [@codex review] raised an issue label May 31, 2026
@Soju06
Copy link
Copy Markdown
Owner

Soju06 commented May 31, 2026

@codex review

1 similar comment
@Soju06
Copy link
Copy Markdown
Owner

Soju06 commented May 31, 2026

@codex review

@chatgpt-codex-connector
Copy link
Copy Markdown

Codex Review: Didn't find any major issues. You're on a roll.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@Soju06 Soju06 added 🤖 codex: ok [@codex review] says no issues found. and removed 🤖 codex: needs work [@codex review] raised an issue labels May 31, 2026
@Soju06
Copy link
Copy Markdown
Owner

Soju06 commented Jun 1, 2026

Blocking issue-coverage / merge-gate note: this PR still shouldn't auto-close #788 as written.

The current implementation is a good bleed-stopper, but the identity path only updates the chosen canonical row and returns:

  • app/modules/accounts/repository.py:160-166 finds the oldest row for chatgpt_account_id, applies fresh token/status metadata, commits, and returns it.
  • It does not repoint dependent rows or delete/merge any other rows with the same chatgpt_account_id.

That means an already-affected database with acc... + acc...__copyN rows still keeps multiple dashboard rows for one upstream identity after reauth, and the stale duplicate can remain active/eligible. The new duplicate-existing regression (tests/integration/test_repositories.py:187-207) proves the oldest row is selected, but it does not assert that the duplicate row is removed or that only one row remains.

#788's expected behavior includes leaving one dashboard row / reconciling history-keyed tables. So please either:

  1. implement the reconciliation/dedupe step for existing same-chatgpt_account_id rows (including FK repoint/dedupe for the dependent tables), or
  2. keep this PR explicitly scoped as a bleed-stopper by changing Closes #788 / the title to Refs/Addresses, adding or linking a follow-up issue for existing-row reconciliation, and making the OpenSpec/tasks state that pre-existing duplicates remain out of scope.

Code/lock direction looks reasonable for the bleed-stopper path; the blocker is the current "closes #788" claim versus the issue's full data-reconciliation acceptance.

@Soju06 Soju06 added the hermes: needs-followup Hermes left a blocker/comment that needs follow-up observation label Jun 1, 2026
@Soju06 Soju06 removed the 🤖 codex: ok [@codex review] says no issues found. label Jun 1, 2026
@Komzpa Komzpa removed the hermes: needs-followup Hermes left a blocker/comment that needs follow-up observation label Jun 1, 2026
@Soju06
Copy link
Copy Markdown
Owner

Soju06 commented Jun 1, 2026

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 319b877840

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread app/modules/accounts/repository.py Outdated
@Soju06 Soju06 added the 🤖 codex: needs work [@codex review] raised an issue label Jun 1, 2026
@Soju06 Soju06 removed the 🤖 codex: needs work [@codex review] raised an issue label Jun 1, 2026
@Komzpa
Copy link
Copy Markdown
Collaborator

Komzpa commented Jun 1, 2026

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f5743bec14

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread app/modules/accounts/repository.py
@Komzpa Komzpa added the 🤖 codex: needs work [@codex review] raised an issue label Jun 1, 2026
@Soju06 Soju06 removed the 🤖 codex: needs work [@codex review] raised an issue label Jun 1, 2026
@Soju06
Copy link
Copy Markdown
Owner

Soju06 commented Jun 1, 2026

@codex review

@chatgpt-codex-connector
Copy link
Copy Markdown

Codex Review: Didn't find any major issues. Chef's kiss.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@Soju06 Soju06 added the 🤖 codex: ok [@codex review] says no issues found. label Jun 1, 2026
@Soju06 Soju06 added the hermes: needs-followup Hermes left a blocker/comment that needs follow-up observation label Jun 2, 2026
@Soju06
Copy link
Copy Markdown
Owner

Soju06 commented Jun 2, 2026

Blocking follow-up: identity reconciliation updates/deletes hot-path account state but does not invalidate the in-process/cross-replica caches.

Evidence on the current head:

  • OAuthService._persist_tokens() calls repo.upsert(account, merge_by_chatgpt_identity=True) directly (app/modules/oauth/service.py:554-558) and there is no cache invalidation in the OAuth path.
  • That repository path can rewrite api_key_accounts (app/modules/accounts/repository.py:245-270), repoint usage/session rows, and delete duplicate accounts rows (app/modules/accounts/repository.py:272-292).
  • The normal account delete path invalidates the same affected caches: account-selection cache, API-key cache, and NAMESPACE_API_KEY cross-instance invalidation (app/modules/accounts/service.py:207-214). Import/reactivate/pause also invalidate account-selection cache.
  • The stale data is used by proxy hot paths: account-selection inputs are cached for 5s unless invalidated (app/modules/proxy/account_cache.py:22-66, consumed at app/modules/proxy/load_balancer.py:410-412), and API-key data is cached for 2s unless invalidated (app/core/auth/api_key_cache.py:18-55).

I reproduced the shape locally with an in-memory SQLite DB: after reauth, DB rows were reconciled to acc_main, but cached API-key/selection data still pointed at deleted acc_dup:

saved_id= acc_main
db_accounts= ['acc_main']
db_assignments= ['acc_main']
cached_api= {'assigned_account_ids': ['acc_dup']}
cached_selection= {'account_ids': ['acc_dup']}

Please invalidate account-selection cache after successful OAuth identity merge, and clear/bump API-key cache invalidation when duplicate reconciliation touches API-key assignments or deletes duplicate accounts. This is part of #788's data-integrity acceptance because immediately after reauth the proxy can otherwise route with stale deleted-account / old-assignment state for the cache TTL window.

Komzpa pushed a commit to Komzpa/codex-lb that referenced this pull request Jun 2, 2026
Addresses the codex P2 review on Soju06#799.

When merge_by_email=True AND merge_by_chatgpt_identity=True with a
chatgpt_account_id set, the postgres advisory lock was previously only
keyed on account.email. Two concurrent reauths for the same upstream
chatgpt_account_id but different email claims would take different
locks, both miss the _account_by_chatgpt_identity lookup, and both
INSERT — recreating the duplicate-row pattern issue Soju06#788 is meant to
prevent.

Take the identity-keyed advisory lock whenever identity reconciliation
is in play, regardless of merge_by_email. When merge_by_email is also
on, the email-scoped lock is still acquired for the email-merge branch
on the same row. Ordering identity-first, email-second gives a stable
acquisition order across callers so two reauths that overlap on either
key serialize without deadlock.

Coverage: tests/unit/test_accounts_repository_locks.py monkeypatches
the postgres dialect and the two lock methods, then pins the lock
sequence across four scenarios:

- merge_by_email=True + merge_by_chatgpt_identity=True ->
  identity lock fires, email lock still fires
- merge_by_email=False + merge_by_chatgpt_identity=True ->
  identity-only lock (unchanged)
- merge_by_email=False + merge_by_chatgpt_identity=False ->
  per-id fallback lock (unchanged)
- merge_by_email=True + merge_by_chatgpt_identity=False ->
  email-only lock (unchanged)
@Komzpa
Copy link
Copy Markdown
Collaborator

Komzpa commented Jun 2, 2026

Folded into #865, which is now rebuilt on main at 031df675dd8dfb3bffa80203b994177fda6d0f4b.

Current #865 status:

  • CI is green for the current head.
  • The current-head Codex review is clean.
  • The folded accounts-identity branch now carries this PR's reviewable scope together with the related account/workspace identity repairs.

Closing this PR to keep the remaining merge queue main-based and non-overlapping.

@Komzpa Komzpa closed this Jun 2, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

🤖 codex: ok [@codex review] says no issues found. hermes: needs-followup Hermes left a blocker/comment that needs follow-up observation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bug: reauth should update existing account instead of creating __copy duplicates

3 participants