fix(accounts): reuse existing row on reauth for same ChatGPT identity (closes #788)#799
fix(accounts): reuse existing row on reauth for same ChatGPT identity (closes #788)#799ozpool wants to merge 9 commits into
Conversation
OAuth reauth previously created a second local account row with an `__copyN` suffix when the existing row's deterministic id collided under merge_by_email=False (issue Soju06#788). The original row, often deactivated because its refresh token was revoked, was left orphaned while the new row took the active state — dashboard then showed two rows for one upstream identity and history-keyed tables (usage, request logs, sticky sessions) became hard to reconcile. Add an identity-aware path on `AccountsRepository.upsert`: - `merge_by_chatgpt_identity=True` (set by the OAuth reauth caller) finds the canonical row for the given `chatgpt_account_id` — preferring the oldest by `created_at` so long-term history stays attached — and applies the fresh tokens, plan, status, and deactivation-reason updates to that row. - The path is intentionally independent of the dashboard `importWithoutOverwrite` setting. That setting governs the side-by-side import flow (two local rows for the same email when importing twice); reauth always reconciles to one local row per upstream identity. - Falls through to the existing deterministic-id behavior when the incoming row has no `chatgpt_account_id`, so legacy local rows that pre-date the field continue to work unchanged. - Holds a chatgpt-id-scoped Postgres advisory lock on reauth so concurrent reauths for the same upstream identity serialize. Wire `OAuthService._persist_tokens` to pass `merge_by_chatgpt_identity=True` for both the direct-session and factory-session code paths. Tests: - New repository tests cover deactivated-row reuse, oldest-row preference when stray `__copy` rows already exist, no-op create when no canonical match is found, and the no-upstream-id fall-through. - The existing OAuth flow test that exercised the side-by-side outcome was renamed and inverted to pin the new contract: two reauths for the same upstream identity now land on one row, with the second reauth's plan applied in place. - The duplicate-email conflict test was reworked so its precondition (two distinct local rows under one email) no longer relies on OAuth side-by-side, by using two distinct upstream ids. Scope notes: - This PR prevents new `__copyN` rows from forming on reauth. It intentionally does not migrate pre-existing duplicate rows in a live database — that needs a separate reconcile script that repoints dependent tables (`usage_history`, `additional_usage_history`, `request_logs`, `sticky_sessions`, `api_key_account_assignments`, `http_bridge_session_records`, `account_limit_warmups`) and de-dupes by chatgpt_account_id. Happy to follow up with that as a separate change. Closes Soju06#788 Signed-off-by: ozpool <jittendersingh389@gmail.com>
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
codex-lb/app/modules/accounts/repository.py
Lines 121 to 126 in 7796445
When merge_by_chatgpt_identity=True, the code still takes an email-scoped advisory lock whenever merge_by_email is enabled (the default path used by OAuth reauth). That means two concurrent reauths for the same upstream chatgpt_account_id but different email claims can run without mutual exclusion, both miss the canonical-row lookup, and both insert, recreating duplicate rows for one identity. Because identity reconciliation now runs regardless of merge_by_email, the lock key also needs to be keyed by ChatGPT identity in this branch.
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Addresses the codex P2 review on Soju06#799. When merge_by_email=True AND merge_by_chatgpt_identity=True with a chatgpt_account_id set, the postgres advisory lock was previously only keyed on account.email. Two concurrent reauths for the same upstream chatgpt_account_id but different email claims would take different locks, both miss the _account_by_chatgpt_identity lookup, and both INSERT — recreating the duplicate-row pattern issue Soju06#788 is meant to prevent. Take the identity-keyed advisory lock whenever identity reconciliation is in play, regardless of merge_by_email. When merge_by_email is also on, the email-scoped lock is still acquired for the email-merge branch on the same row. Ordering identity-first, email-second gives a stable acquisition order across callers so two reauths that overlap on either key serialize without deadlock. Coverage: tests/unit/test_accounts_repository_locks.py monkeypatches the postgres dialect and the two lock methods, then pins the lock sequence across four scenarios: - merge_by_email=True + merge_by_chatgpt_identity=True -> identity lock fires, email lock still fires - merge_by_email=False + merge_by_chatgpt_identity=True -> identity-only lock (unchanged) - merge_by_email=False + merge_by_chatgpt_identity=False -> per-id fallback lock (unchanged) - merge_by_email=True + merge_by_chatgpt_identity=False -> email-only lock (unchanged)
|
Addressed the codex P2 in 33ab6cb. When Fix takes the identity-keyed lock whenever identity reconciliation is in play, regardless of Coverage in
@codex review |
|
To use Codex here, create a Codex account and connect to github. |
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 33ab6cb00a
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
@codex review |
1 similar comment
|
@codex review |
|
Codex Review: Didn't find any major issues. You're on a roll. ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
|
Blocking issue-coverage / merge-gate note: this PR still shouldn't auto-close #788 as written. The current implementation is a good bleed-stopper, but the identity path only updates the chosen canonical row and returns:
That means an already-affected database with #788's expected behavior includes leaving one dashboard row / reconciling history-keyed tables. So please either:
Code/lock direction looks reasonable for the bleed-stopper path; the blocker is the current "closes #788" claim versus the issue's full data-reconciliation acceptance. |
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 319b877840
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: f5743bec14
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
@codex review |
|
Codex Review: Didn't find any major issues. Chef's kiss. ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
|
Blocking follow-up: identity reconciliation updates/deletes hot-path account state but does not invalidate the in-process/cross-replica caches. Evidence on the current head:
I reproduced the shape locally with an in-memory SQLite DB: after reauth, DB rows were reconciled to Please invalidate account-selection cache after successful OAuth identity merge, and clear/bump API-key cache invalidation when duplicate reconciliation touches API-key assignments or deletes duplicate accounts. This is part of #788's data-integrity acceptance because immediately after reauth the proxy can otherwise route with stale deleted-account / old-assignment state for the cache TTL window. |
Addresses the codex P2 review on Soju06#799. When merge_by_email=True AND merge_by_chatgpt_identity=True with a chatgpt_account_id set, the postgres advisory lock was previously only keyed on account.email. Two concurrent reauths for the same upstream chatgpt_account_id but different email claims would take different locks, both miss the _account_by_chatgpt_identity lookup, and both INSERT — recreating the duplicate-row pattern issue Soju06#788 is meant to prevent. Take the identity-keyed advisory lock whenever identity reconciliation is in play, regardless of merge_by_email. When merge_by_email is also on, the email-scoped lock is still acquired for the email-merge branch on the same row. Ordering identity-first, email-second gives a stable acquisition order across callers so two reauths that overlap on either key serialize without deadlock. Coverage: tests/unit/test_accounts_repository_locks.py monkeypatches the postgres dialect and the two lock methods, then pins the lock sequence across four scenarios: - merge_by_email=True + merge_by_chatgpt_identity=True -> identity lock fires, email lock still fires - merge_by_email=False + merge_by_chatgpt_identity=True -> identity-only lock (unchanged) - merge_by_email=False + merge_by_chatgpt_identity=False -> per-id fallback lock (unchanged) - merge_by_email=True + merge_by_chatgpt_identity=False -> email-only lock (unchanged)
|
Folded into #865, which is now rebuilt on Current #865 status:
Closing this PR to keep the remaining merge queue |
Why
OAuth reauth currently creates a second local account row with an
__copyNsuffix whenever the existing row's deterministic id collides undermerge_by_email=False(issue #788). The original row — often deactivated because its refresh token was revoked — is left orphaned while the new row takes the active state. The dashboard then shows two rows for one upstream ChatGPT identity, and history-keyed tables become hard to reconcile.What
Add an identity-aware path on
AccountsRepository.upsert:merge_by_chatgpt_identity=True(set by the OAuth reauth caller) finds the canonical row for the givenchatgpt_account_id— preferring the oldest bycreated_atso long-term history stays attached — and applies the fresh tokens, plan, status, and deactivation-reason updates to that row.importWithoutOverwritesetting. That setting governs the side-by-side import flow (two local rows for the same email when importing twice); reauth always reconciles to one local row per upstream identity.chatgpt_account_id, so legacy local rows that pre-date the field keep working unchanged.OAuthService._persist_tokensnow passesmerge_by_chatgpt_identity=Truefor both the direct-session and factory-session code paths.Tests
tests/integration/test_repositories.pycovers deactivated-row reuse, oldest-row preference when stray__copyrows already exist, dependent-row reconciliation, API key assignment de-dupe, warmup collision de-dupe, no-op create when no canonical match is found, and the no-upstream-id fall-through.tests/integration/test_oauth_flow.pypins the OAuth reauth contract: two reauths for the same upstream identity now land on one row, with the second reauth's plan applied in place. The duplicate-email conflict test was reworked so its precondition (two distinct local rows under one email) no longer relies on OAuth side-by-side, by using two distinct upstream ids.openspec/changes/merge-reauth-by-chatgpt-identity/documents the reauth identity merge and duplicate-row reconciliation contract.Focused validation after the latest repair:
Compatibility note for the renamed test
test_device_oauth_flow_keeps_separate_accounts_when_import_without_overwrite_enabledwas pinning behavior that this issue calls out as the bug, so it has been renamed and its assertions inverted rather than kept alongside the new test. TheimportWithoutOverwritesetting still applies fully to the dashboard import path — only the OAuth reauth code path now reconciles by upstream identity.Closes #788