Skip to content

Design proposal: make tab/session lifecycle recovery explicit #352

@honor2030

Description

@honor2030

Before submitting

  • I searched existing issues and PRs.
  • This is a feature/change request, not a single bug report.

Related open PRs I found while checking overlap:

Those PRs all look useful individually, but they also point at a shared lifecycle problem around “what is the currently attached tab/session, and how do helpers recover when it is no longer live?”

Problem

browser-harness currently handles stale tab/session states in several focused places:

  • attach_first_page() picks a page target and enables default domains.
  • set_session updates self.session / self.target_id after switch_tab() and new_tab().
  • default domains are re-enabled after session changes so helpers like wait_for_network_idle() keep working.
  • current_tab has a not_attached path when target_id is missing.
  • stale-session recovery in handle() currently depends on specific error text such as Session with given id not found and, in fix(daemon): re-attach on 'no close frame' WebSocket error #347, no close frame.
  • cosmetic/best-effort work such as tab-title unmarking can still become a blocking operation if it talks to a dead renderer.

The result is that each new CDP edge case tends to be fixed locally with another timeout/string-match/reattach patch. That is pragmatic, but it makes it hard to know the intended invariant:

Before a helper performs a renderer-bound operation, what guarantees do we have about the current target/session? If those guarantees fail, where is recovery supposed to happen?

This is mostly visible in stale-target cases: closed tabs, discarded tabs, Chrome restarts, expired sessions, or WebSocket/session close paths.

Proposal

Add a small explicit internal contract for tab/session lifecycle recovery, without changing the public helper API initially.

Possible shape:

  1. Define the daemon’s internal session states / failure classes, e.g.
    • no_session
    • attached
    • target_missing
    • session_stale
    • renderer_unresponsive
    • browser_disconnected
  2. Centralize recovery in one small helper such as _ensure_live_session() / _recover_session() / _with_live_session() that is used by daemon request handling and session-changing paths.
  3. Treat cosmetic operations, such as removing the controlled-tab marker from the old tab, as explicitly best-effort and bounded by a short timeout.
  4. Keep the first implementation slice small and testable:
  5. Add unit tests with mocked CDP/IPC for the contract. Live Chrome discarded-tab / killed-session repros can remain manual or become a later integration fixture if the project wants that.

A deliberately small first PR could be just:

  • introduce the internal recovery helper / state classification;
  • route one existing path through it;
  • add regression coverage for the chosen path;
  • leave broader cleanup for follow-up PRs.

Alternatives considered

If this direction seems useful, I’m happy to open a small first PR that starts with one path and keeps the diff focused.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions