Skip to content

fix(proxy): retire stale HTTP bridge pending sessions#858

Open
DZsolt01 wants to merge 9 commits into
Soju06:mainfrom
DZsolt01:fix-http-bridge-stale-pending-retirement
Open

fix(proxy): retire stale HTTP bridge pending sessions#858
DZsolt01 wants to merge 9 commits into
Soju06:mainfrom
DZsolt01:fix-http-bridge-stale-pending-retirement

Conversation

@DZsolt01
Copy link
Copy Markdown

@DZsolt01 DZsolt01 commented May 30, 2026

Summary

Fixes HTTP bridge sessions that can get stuck with stale pending work after an upstream websocket closes before response.completed. Failed precreated replay now terminally fails pending requests, releases the response-create gate, retires the stale bridge session, and removes local/durable ownership so later same-session requests can recover on a fresh bridge instead of looping on proxy_overloaded.

Type of change

  • fix: — bug fix (no behavior change beyond the bug)
  • feat: — new user-facing feature or capability
  • refactor: — internal refactor (no behavior change, no API change)
  • docs: — documentation only
  • chore: / ci: / build: — tooling, CI, packaging
  • Breaking change (also append ! after the type, e.g. feat!: or include BREAKING CHANGE: footer)

Linked issue: Not linked

OpenSpec

  • This PR includes / updates an OpenSpec change
  • Not applicable — bug fix that matches the existing spec
  • Not applicable — docs / CI / chore only
  • This PR touches a codex-faithful path (image pipeline, request/response shape, SSE framing, OAuth flow) and preserves upstream-equivalent behavior

Change directory: openspec/changes/bound-http-bridge-startup-waits/

Changes

  • Retire stale HTTP bridge sessions when precreated replay fails after upstream close or timeout.
  • Clear pending request state, release response_create_gate, release durable bridge ownership, and close the stale upstream websocket.
  • Add response-create gate timeout diagnostics for bridge key hash, pending/queued counts, gate availability, pending request ids, and ages.
  • Add regression coverage for failed replay, stale cleanup, request-log failure during terminal cleanup, and timeout diagnostics.

Test plan

.\.venv\Scripts\python.exe -m pytest tests\unit\test_proxy_utils.py tests\unit\test_http_bridge_cancel_drain.py tests\unit\test_proxy_http_bridge.py -q
# 502 passed

.\.venv\Scripts\python.exe -m ruff check app\modules\proxy\service.py tests\unit\test_proxy_http_bridge.py
# All checks passed!

Linked issue:

Fixes #857

Screenshots / output (optional)

Local codex exec review --uncommitted --full-auto --ephemeral completed and reported no actionable findings.

Checklist

  • Title is in Conventional Commits format (<type>(<scope>)?: <subject>).
  • Linked the related issue / discussion above.
  • Added or updated tests covering the change.
  • Ran uv run pre-commit run local-ci --hook-stage manual --all-files or the relevant make <target> subset locally.
  • If touching specs: openspec validate --specs passes and /opsx:verify is clean.
  • CHANGELOG is not edited by hand (release-please handles it).

@DZsolt01 DZsolt01 marked this pull request as ready for review May 30, 2026 20:55
@Komzpa Komzpa force-pushed the fix-http-bridge-stale-pending-retirement branch from 68875e1 to 489a6cc Compare May 31, 2026 21:39
@Soju06
Copy link
Copy Markdown
Owner

Soju06 commented May 31, 2026

@codex review

@chatgpt-codex-connector
Copy link
Copy Markdown

Codex Review: Didn't find any major issues. Hooray!

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@Soju06 Soju06 added the 🤖 codex: ok [@codex review] says no issues found. label May 31, 2026
@Soju06
Copy link
Copy Markdown
Owner

Soju06 commented Jun 1, 2026

Blocking race found in the stale-session retirement path.

_submit_http_bridge_request() only checks session.closed before it queues/waits. A concurrent request can already be waiting on the same response_create_gate when the reader handles an upstream close/timeout. The reader then fails the old pending request and releases the gate (app/modules/proxy/service.py:6930-6978), and only then retires/unregisters/closes the session via _retire_stale_pending_http_bridge_session() (app/modules/proxy/service.py:6864-6899). There is no post-gate/post-lock check before the waiter appends and sends (app/modules/proxy/service.py:6567-6603).

I reproduced this locally against the PR head: start a second _submit_http_bridge_request() while the old request owns the gate, then have _relay_http_bridge_upstream_messages() process a close with failed precreated replay. Result: the second submit still calls send_text and leaves pending=['new'] on session.closed=True with the session no longer registered in _http_bridge_sessions (send_text_calls=1, registered=False, new_gate_acquired=True). This can orphan a new same-session request on a closed/unregistered bridge and reintroduce leaked gate/admission/pending state.

Please make retirement/admission mutually exclusive: mark the session closing/retiring before releasing the old request gate, and/or re-check session.closed plus current registry ownership after acquiring the gate and before appending/sending. Add a regression test for the concurrent waiter case.

@Soju06 Soju06 added hermes: needs-followup Hermes left a blocker/comment that needs follow-up observation and removed 🤖 codex: ok [@codex review] says no issues found. labels Jun 1, 2026
@DZsolt01
Copy link
Copy Markdown
Author

DZsolt01 commented Jun 1, 2026

Addressed the stale HTTP bridge admission/retirement race follow-ups.

  • Require bridged submit/prewarm paths to still own the registered session before reconnecting or sending.
  • Serialize post-admission validation, enqueue, and upstream send with per-session lifecycle locking, while keeping
    slow I/O off the global bridge registry lock.
  • Retire reader-crashed bridge sessions from local reuse, close upstream resources, and make duplicate retirement
    cleanup idempotent.
  • Fixed prewarm cleanup so it does not decrement visible request queue counts.

@Komzpa Komzpa removed the hermes: needs-followup Hermes left a blocker/comment that needs follow-up observation label Jun 1, 2026
@Komzpa Komzpa force-pushed the fix-http-bridge-stale-pending-retirement branch from 1c7c0f9 to 6e91920 Compare June 1, 2026 14:04
@Komzpa
Copy link
Copy Markdown
Collaborator

Komzpa commented Jun 1, 2026

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 6e919204f3

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread tests/unit/test_proxy_http_bridge.py Outdated
@Soju06 Soju06 added the 🤖 codex: needs work [@codex review] raised an issue label Jun 1, 2026
@Soju06 Soju06 removed the 🤖 codex: needs work [@codex review] raised an issue label Jun 1, 2026
@Komzpa
Copy link
Copy Markdown
Collaborator

Komzpa commented Jun 1, 2026

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b525a31522

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread app/modules/proxy/service.py
Comment thread app/modules/proxy/service.py
@Komzpa Komzpa added the 🤖 codex: needs work [@codex review] raised an issue label Jun 1, 2026
@Komzpa
Copy link
Copy Markdown
Collaborator

Komzpa commented Jun 1, 2026

@codex review

@Soju06 Soju06 removed the 🤖 codex: needs work [@codex review] raised an issue label Jun 1, 2026
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 4594934488

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread app/modules/proxy/service.py
@Komzpa Komzpa added the 🤖 codex: needs work [@codex review] raised an issue label Jun 1, 2026
@Komzpa
Copy link
Copy Markdown
Collaborator

Komzpa commented Jun 1, 2026

@codex review

@Soju06 Soju06 removed the 🤖 codex: needs work [@codex review] raised an issue label Jun 1, 2026
@chatgpt-codex-connector
Copy link
Copy Markdown

Codex Review: Didn't find any major issues. Bravo.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bug: HTTP bridge stays pending after stream_incomplete and exhausts proxy budget

3 participants