fix(frontend/auth): harden token-expiry retry path#9390
Conversation
Implements PR 1 of dev/specs/2026-05-29-frontend-error-auth-improvements.md: - E2: cap silent-refresh retry via a per-operation counter so a TOKEN_EXPIRED → refresh → TOKEN_EXPIRED loop bails to /login after one attempt instead of spinning forever (clock skew, server-side revoke, or a malformed refreshed token would otherwise loop). - E1: prefix the replayed `authorization` header with `Bearer ` so the retried request authenticates via the header (matches authLink) and no longer silently relies on the access_token cookie fallback. - A2: redirectToLogin() encodes the current path as `?from=…` and LoginPage consumes it as a fallback when location.state?.from is absent (which it is after a hard navigation out of errorLink). - A6: type LoginPage's errors as `RestErrorItem[]` instead of an inline `any`-shaped annotation. - E3: normalise the 2xx-with-errors branch in auth-callback to throw a typed FetchError so the catch handler sees a single shape; type the errors state as `RestErrorItem[] | null`. New unit test in graphqlClientApollo.test.ts covers the retry-cap helper (`bumpAuthRetryCount`). Also adds a one-line clarifying comment on FetchError noting that `status` may be 2xx for "200 with errors" SSO callbacks. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
2 issues found across 5 files
Confidence score: 3/5
- There is some merge risk because
frontend/app/src/pages/auth-callback.tsxhas a concrete user-facing failure path: if an error lacks anerrorsarray, the catch block can leave auth-callback stuck on the loading screen. frontend/app/src/shared/api/graphql/graphqlClientApollo.tsxintroduces a?from=fallback, but SSO login buttons do not consume it yet, so users who hard-navigate can still lose their original destination after sign-in.- These are both medium-severity, behavior-impacting auth flow issues rather than housekeeping changes, so this is likely fixable but worth addressing before relying on it broadly. Pay close attention to
frontend/app/src/pages/auth-callback.tsx,frontend/app/src/shared/api/graphql/graphqlClientApollo.tsx- loading-state hangs and redirect-destination loss in SSO paths.
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="frontend/app/src/shared/api/graphql/graphqlClientApollo.tsx">
<violation number="1" location="frontend/app/src/shared/api/graphql/graphqlClientApollo.tsx:172">
P2: The new `?from=` fallback is not consumed by the SSO login buttons, so hard-navigated users still lose their original destination when they sign in with SSO.</violation>
</file>
Shadow auto-approve: would not auto-approve because issues were found.
Re-trigger cubic
The `?from=` query param set by `redirectToLogin()` is attacker-controlled once it lands on /login (anyone can craft `/login?from=...`). Previously LoginPage concatenated the value straight into Navigate's `to` prop, so a crafted `?from=//evil.com` or `?from=https://evil.com` could redirect the freshly authenticated user off-origin. Adds a shared `safeInternalPath` helper that: - rejects anything not starting with a single `/` (blocks protocol- relative `//host` and schemed `https://`, plus `javascript:` payloads) - resolves via `new URL(raw, window.location.origin)` and compares `url.origin` against the page origin as the authoritative same-origin check - returns a `Partial<Path>` so callers pass it straight to Navigate without re-splitting pathname/search/hash Also drops the old `(pathname || "") + (search ?? "")` concat in LoginPage's state-from branch — passing the router-state's `Path` object through to Navigate as-is removes the edge case where a falsy pathname combined with a truthy search produced a non-routable string. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`redirectToLogin()` hard-navigates expired-token users to
`/login?from=<originalPath>` so they can return after signing in.
Password sign-in honours this via LoginPage's `?from=` fallback, but the
SSO buttons did not — they only read `location.state?.from` (router
state), which is empty on hard nav. Result: SSO users always landed at
`/` after the OIDC round-trip instead of the page they started on.
Mirror LoginPage's resolution order in `login-sso-buttons.tsx`: prefer
`location.state?.from`, fall back to `searchParams.get("from")` validated
through `safeInternalPath`. Serialise the resulting `Partial<Path>` back
to a flat string for the backend's `final_url=...` param.
Also URL-encode `final_url` — a destination containing `&` or `?` would
have corrupted the authorize URL's own query string.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ck failure
The SSO callback's catch handler only set `errors` state when the thrown
value was a FetchError carrying an `errors` array. Network failures
(fetch reject, AbortError) and FetchErrors with no envelope fell through
silently, leaving the user staring at the "Authenticating..." spinner
indefinitely with nothing in the URL or UI to indicate what went wrong.
Add an else branch that logs the unexpected failure and stuffs a generic
RestErrorItem into the `errors` state. The existing `if (errors) return
<Navigate to="/login" state={{ errors }} />` branch then bounces the user
back to /login with a visible "Failed to complete sign-in" message
instead of the hung spinner.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
0 issues found across 5 files (changes from recent commits).
Shadow auto-approve: would auto-approve. These changes harden the token-expiry retry path with a per-operation counter, fix the authorization header prefix, add safe redirect handling via safeInternalPath, and normalise SSO callback errors; unit tests cover the new utilities and the retry helper, and the overall impact is isolated to...
Re-trigger cubic
`safeInternalPath` already preserves the hash on the way back in, but `redirectToLogin()` built the `?from=` value from `pathname + search` only, losing any hash. SPAs use the URL hash for in-page UI state (active tab, anchor), so a TOKEN_EXPIRED on `/foo?bar=1#tab=details` would land the user on `/foo?bar=1` after re-auth instead of restoring the original view. Append `window.location.hash` so the round-trip is symmetric with the consumer side.
The TOKEN_EXPIRED path called `retryWithRefreshedToken`, which fetched a new access token via TanStack Query and replayed the operation. If the refresh itself rejected (refresh token expired, server-side revoke, network error), the observer surfaced an error to Apollo and the caller saw a generic network failure — but the stale tokens were left in localStorage and every subsequent query repeated the same fail-to-refresh dance. Call `redirectToLogin()` in the refresh `.catch` so the user is logged out and bounced to /login with `?from=…` instead of staying signed in to a session the server has already disowned.
There was a problem hiding this comment.
0 issues found across 1 file (changes from recent commits).
Shadow auto-approve: would require human review. The changes touch core authentication logic including token refresh, error handling, redirects, and open-redirect protection, which are critical and security-sensitive; even though the AI review found no issues, the blast radius and risk of breakage warrant a human review.
Re-trigger cubic
`LoginPage` and `LoginWithSSOButtons` cast `location.state?.from` to `Partial<Path>` without a runtime guard. In practice only `ProtectedRoute` writes this state today so the value is safe — but the defensive posture is asymmetric: the query-string twin runs through `safeInternalPath`, the state-derived one does not. A future caller that smuggles a string or an external URL into router state would bypass the open-redirect guard entirely. Serialise the state value back to a path string and re-run it through `safeInternalPath`. Same guard, same shape on both sides — the cast becomes load-bearing for typing only, not for trust. Also: drop a comment next to the `code: 0` sentinel in auth-callback so readers know it is forged on the frontend, not a backend code.
There was a problem hiding this comment.
4 issues found across 8 files
Confidence score: 2/5
- There is a high-confidence open-redirect gap in
frontend/app/src/entities/authentication/utils.ts:safeInternalPathcan normalize inputs like/..//evil.cominto protocol-relative//evil.com, which undermines the guard and is a security risk. frontend/app/src/entities/authentication/ui/login-sso-buttons.tsxappears to trustlocation.state.fromfor SSOfinal_urlwithout equivalent sanitization, creating another user-facing redirect bypass path.frontend/app/src/shared/api/graphql/graphqlClientApollo.tsxhas a likely auth-flow bug where TOKEN_EXPIRED retries won’t hit the intended cap/redirect path, so persistent token expiry may fail to send users back to login; hash-loss handling is a smaller UX regression in the same file.- Pay close attention to
frontend/app/src/entities/authentication/utils.ts,frontend/app/src/entities/authentication/ui/login-sso-buttons.tsx, andfrontend/app/src/shared/api/graphql/graphqlClientApollo.tsx- redirect safety and auth recovery behavior need tightening before merge.
Shadow auto-approve: would not auto-approve because issues were found.
Re-trigger cubic
The unit test for `bumpAuthRetryCount` proved the counter increments, but didn't prove the contract it exists to enforce: TOKEN_EXPIRED → refresh → replay; second TOKEN_EXPIRED on the same operation → bail to /login. That contract lived only in errorLink's switch body and had no test. Refactor: - Extract the `onError` callback into an exported `handleGraphQLAuthError` so tests can drive it without spinning up an Apollo link chain. - Introduce a `__navigation` holder that production reads through and tests can swap out. `window.location.assign` is non-configurable in vitest's browser mode, so spying directly on it throws. Tests: - First TOKEN_EXPIRED → returns the retry observable, calls queryClient.fetchQuery, and replays the operation with the Bearer-prefixed new token. - Second TOKEN_EXPIRED on the same operation → no replay, hard-nav to /login?from=…, tokens cleared. - Refresh failure (refresh-token expired / revoked) → observable errors, tokens cleared, hard-nav to /login. Verifies the new branch added in the previous commit. Switch-case bodies in `handleGraphQLAuthError` were split to satisfy biome's `noVoidReturn` rule and TS's narrower return type — semantics unchanged.
`safeInternalPath` gated on the raw input starting with `/` but not `//`,
then trusted WHATWG URL normalization to validate the rest. That left a
class of payloads that pass the leading-/ gate, resolve same-origin via
`URL()`, and yet collapse to a protocol-relative *pathname* — e.g.
new URL('/..//evil.com', 'http://localhost:3000').pathname === '//evil.com'
Origin equality holds (still localhost), so the old guard returned
`{pathname: "//evil.com", ...}`. Downstream callers then serialise that
back into a string for `<Navigate to=…>` or SSO `final_url=…`, where
`//evil.com` is a protocol-relative reference and the browser jumps to
`https://evil.com`. Open-redirect bypass.
Add a post-normalization pathname check that rejects anything starting
with `//`, regardless of origin. Three new test vectors cover `/..//`,
`/../..//`, and `/foo/..//`.
The `bumpAuthRetryCount > 1` cap added in the original PR was dead code.
Reading `@apollo/client/link/error`:
retriedSub = retriedResult.subscribe({
next: observer.next.bind(observer),
error: observer.error.bind(observer),
complete: observer.complete.bind(observer),
});
…the retried observable's `next` is bound directly to the *outer*
observer. The errorHandler is not re-invoked for results that come back
from `forward(operation)`, so a second TOKEN_EXPIRED on the same
operation never re-enters the switch and the counter cannot exceed 1.
The threat the cap was meant to defend against is real, but the defense
needs to live where the retried result actually surfaces — inside
`retryWithRefreshedToken`'s subscriber. Inspect the replayed result for
TOKEN_EXPIRED; if it persists (clock skew, malformed refreshed token,
server-side revoke between refresh and replay), bounce to /login and
error the observer instead of leaking the stale-auth state to the
caller.
Drop `bumpAuthRetryCount` and the three unit tests pinned to it — the
loop is bounded by construction now (Apollo guarantees a single replay
per operation) and the bail is tested via the new
"replayed result that still carries TOKEN_EXPIRED bails to /login" case.
`key={index}` is React's standard antipattern — fine when the list never
reorders, but couples the row identity to its position, so any future
caller that prepends/sorts errors would re-key the wrong rows and
discard their state.
Use `${code}-${message}` instead. The (code, message) tuple is unique
within an error envelope by construction (the backend doesn't emit two
identical entries; the frontend-synthesised `code: 0` row appears at
most once), and the key now travels with the row even if order changes.
`LoginPage` and `LoginWithSSOButtons` were carrying the same four-line `from`-resolution dance: read router state, fall back to `?from=`, run both through `safeInternalPath`, default to `/`. Two identical copies mean two places to drift the next time the rule changes (and the rule *is* the open-redirect guard, so drift is the failure mode you can't afford). Move it into `entities/authentication/utils.ts` as `resolveLoginRedirect(location, searchParams)`. Both callers now reduce to one line, the long explanatory comment lives next to the helper, and the open-redirect rule has a single source of truth. No behaviour change — `pnpm test --run src/entities/authentication src/pages/login src/shared/api/graphql/graphqlClientApollo.test.ts` remains green (52 passed). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…oken The "refresh succeeded but the response had no `access_token`" branch inside `retryWithRefreshedToken` errored the observer but left stale tokens in `localStorage` — inconsistent with the sibling `.catch` branch, which clears credentials and hard-navs to /login. A user who hit this (server bug, partial JSON, schema drift) would stay signed-in against tokens the server had already disowned and every subsequent query would wall against the same error. Call `redirectToLogin()` before erroring the observer so the cleanup path is symmetric: refresh-failed and refresh-returned-nothing now both clear tokens and route the user back to /login with `?from=…`. Also collapse `resultHasTokenExpired` to a single `?.some(...) ?? false` expression — equivalent, just smaller. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
1 issue found across 7 files (changes from recent commits).
Shadow auto-approve: would not auto-approve because issues were found.
Tip: Review your code locally with the cubic CLI to iterate faster.
Re-trigger cubic
There was a problem hiding this comment.
0 issues found across 1 file (changes from recent commits).
Shadow auto-approve: would require human review. This PR modifies critical authentication and token-refresh logic, including error handling, redirect validation, and SSO callback processing, which could introduce security vulnerabilities or break the login flow if incorrect.
Re-trigger cubic
Implements PR 1 of dev/specs/2026-05-29-frontend-error-auth-improvements.md:
authorizationheader withBearerso the retried request authenticates via the header (matches authLink) and no longer silently relies on the access_token cookie fallback.?from=…and LoginPage consumes it as a fallback when location.state?.from is absent (which it is after a hard navigation out of errorLink).RestErrorItem[]instead of an inlineany-shaped annotation.RestErrorItem[] | null.New unit test in graphqlClientApollo.test.ts covers the retry-cap helper (
bumpAuthRetryCount). Also adds a one-line clarifying comment on FetchError noting thatstatusmay be 2xx for "200 with errors" SSO callbacks.Impact & rollout
Checklist
uv run towncrier create ...)Summary by cubic
Stops token-refresh loops and hardens auth redirects. Detects persistent TOKEN_EXPIRED on replay, clears tokens, and redirects to /login while preserving the full return path; also normalizes callback errors.
/login?from=…; also bounce when refresh fails or returns noaccess_token. On retry, sendauthorization: Bearer <token>./login?from=…(now preserves the hash); validate both query and router-state targets withsafeInternalPathincluding paths that normalize to//…, reuse the same target for SSO, and URL‑encodefinal_url.FetchError, and on non-envelope failures show a generic error and return to/login(note:FetchError.statuscan be 2xx).RestErrorItem[]and render them with stable, unique keys.Written for commit 03e5ac6. Summary will update on new commits.