Skip to content

grpc: fix shutdown ref cycles in Channel and Subchannel#2537

Merged
dfawley merged 2 commits intohyperium:masterfrom
dfawley:cycles
Mar 11, 2026
Merged

grpc: fix shutdown ref cycles in Channel and Subchannel#2537
dfawley merged 2 commits intohyperium:masterfrom
dfawley:cycles

Conversation

@dfawley
Copy link
Copy Markdown
Collaborator

@dfawley dfawley commented Mar 5, 2026

  • Channel ref cycle:

    • LbController impl'd WorkScheduler which was given to child LB policy
    • LbController owns child LB policy directly
  • Subchannel ref cycle:

    • Spawned task holds a ref to the InternalSubchannel
    • Task relies on receiver erroring to exit
    • Sender is held by the InternalSubchannel, creating a direct reference cycle

  • Fixed Channel by creating a separate type to impl WorkScheduler

  • Fixed Subchannel:

    • Removed the "one task" entirely
    • Restructured so that an Arc<Mutex> handles all the state transitions
    • Any spawned tasks await a signal that the outer InternalSubchannel was dropped

dfawley added 2 commits March 9, 2026 14:25
- Channel ref cycle:
  - LbController impl'd WorkScheduler which was given to child LB policy
  - LbController owns child LB policy directly
- Subchannel ref cycle:
  - Spawned task holds a ref to the InternalSubchannel
  - Task relies on receiver erroring to exit
  - Sender is held by the InternalSubchannel, creating a direct reference cycle

- Fixed Channel by creating a separate type to impl WorkScheduler
- Fixed Subchannel:
  - Removed the "one task" entirely
  - Restructured so that an Arc<Mutex<Inner>> handles all the state transitions
  - Any spawned tasks await a signal that the outer InternalSubchannel was dropped
@nathanielford
Copy link
Copy Markdown
Collaborator

(Notes from in-person discussion)

  • The InternalSubchannel/InnerSubchannel distinction is confusing and has the scent of being too complicated.
  • It's probable that what we want is that the task managing the FSM should own the InnerSubchannel, allowing us to skip the data Arc.
  • This would mean the InternalSubchannel should just have send ability to the InnerSubchannel. If we have a watcher that can have the transport filled in, that should be sufficient. This gets us out of both of having a loop that prevents the task from shutting down and also the awkward ownership/packaging the Internal/Inner subchannels has at present.

This refactor should happen on it's own, after this PR. Given that this removes the cyclical references and unblocks us, we're going to push it through.

Copy link
Copy Markdown
Collaborator

@nathanielford nathanielford left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(See previous notes - there is work to simplify this area of code, but the core pain point is handled for now!)

@dfawley dfawley merged commit 0b66abb into hyperium:master Mar 11, 2026
21 checks passed
@dfawley dfawley deleted the cycles branch March 11, 2026 14:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants