Skip to content

fix: prevent async functional API from double-consuming interrupt resume values#6739

Open
Abhinaba Banerjee (abhigyan631) wants to merge 4 commits intolangchain-ai:mainfrom
abhigyan631:fix/async-functional-api-double-resume-consumption
Open

fix: prevent async functional API from double-consuming interrupt resume values#6739
Abhinaba Banerjee (abhigyan631) wants to merge 4 commits intolangchain-ai:mainfrom
abhigyan631:fix/async-functional-api-double-resume-consumption

Conversation

@abhigyan631
Copy link
Copy Markdown

Fix for issue #6660.

In async mode, both parent and child scratchpads were capturing references to the same resume tuple in pending_writes. When both attempted to consume the value, it caused iterations to be skipped (e.g., output 1, 123, 12345 instead of expected 1, 12, 123, 1234, 12345).

The fix ensures child scratchpads always delegate to parent for null_resume, preventing the double-consumption race condition. Only the parent scratchpad now 'owns' the resume value lookup.

Thank you for contributing to LangGraph! Follow these steps to mark your pull request as ready for review. If any of these steps are not completed, your PR will not be considered for review.

  • PR title: Follows the format: {TYPE}({SCOPE}): {DESCRIPTION}

    • Examples:
      • feat(core): add multi-tenant support
      • fix(cli): resolve flag parsing error
      • docs(openai): update API usage examples
    • Allowed {TYPE} values:
      • feat, fix, docs, style, refactor, perf, test, build, ci, chore, revert, release
    • Allowed {SCOPE} values (optional):
      • langgraph, docs, cli, checkpoint, checkpoint-postgres, checkpoint-sqlite, prebuilt, scheduler-kafka, sdk-py
    • Once you've written the title, please delete this checklist item; do not include it in the PR.
  • PR message: Delete this entire checklist and replace with

    • Description: a description of the change. Include a closing keyword if applicable.
    • Issue: the issue # it fixes, if applicable
    • Dependencies: any dependencies required for this change
    • Twitter handle: if your PR gets announced, and you'd like a mention, we'll gladly shout you out!
  • Add tests and docs: If you're adding a new integration, you must include:

    1. A test for the integration, preferably unit tests that do not rely on network access,
    2. An example notebook showing its use. It lives in docs/docs/integrations directory.
  • Lint and test: Run make format, make lint and make test from the root of the package(s) you've modified. We will not consider a PR unless these three are passing in CI. See contribution guidelines for more.

Additional guidelines:

  • Make sure optional dependencies are imported within a function.
  • Please do not add dependencies to pyproject.toml files (even optional ones) unless they are required for unit tests.
  • Most PRs should not touch more than one package.
  • Changes should be backwards compatible.

…ume values

Fix for issue langchain-ai#6660.

In async mode, both parent and child scratchpads were capturing references
to the same resume tuple in pending_writes. When both attempted to consume
the value, it caused iterations to be skipped (e.g., output 1, 123, 12345
instead of expected 1, 12, 123, 1234, 12345).

The fix ensures child scratchpads always delegate to parent for null_resume,
preventing the double-consumption race condition. Only the parent scratchpad
now 'owns' the resume value lookup.
@hinthornw
Copy link
Copy Markdown
Collaborator

Do you happen to have a test case that reproduces this?

@abhigyan631
Copy link
Copy Markdown
Author

Do you happen to have a test case that reproduces this?

Yes, I've added a regression test case, test_repro_issue_6660.py, that reproduces the reproduction scenario (5 async tasks/resumes).

@mdrxy Mason Daugherty (mdrxy) added the bypass-issue-check Maintainer override: skip issue-link enforcement label Mar 24, 2026
@markdascher
Copy link
Copy Markdown

#6660 guesses at three possible solutions:

  • the parent scratchpad shouldn't include the null_resume_write?
  • the child scratchpad shouldn't include the null_resume_write?
  • both scratchpads should reference the same pending_writes object, so that consuming null_resume_write in the child automatically consumes it in the parent too?

This PR comes up with a fourth option of simply ignoring the child scratchpad's pending_writes, but only for the purposes of calculating the null_resume_write value. Which seems confusing? (Why put the NULL_TASK_ID into pending_writes only to later ignore it?) Also pending_writes is also referenced later to calculate task_resume_write, which I can't rule out having a similar bug, without fully understanding the problem in the first place.

Certainly having a two-line fix is attractive, but I'm curious if it's possible to compare the relative merits of the four different approaches. Unfortunately I don't really have a mental model of what this logic is supposed to do, but (as an outside observer) I worry that the bug may be caused by the logic already being too complicated, rather than not being complicated enough.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bypass-issue-check Maintainer override: skip issue-link enforcement external

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants