fix: prevent async functional API from double-consuming interrupt resume values#6739
Conversation
…ume values Fix for issue langchain-ai#6660. In async mode, both parent and child scratchpads were capturing references to the same resume tuple in pending_writes. When both attempted to consume the value, it caused iterations to be skipped (e.g., output 1, 123, 12345 instead of expected 1, 12, 123, 1234, 12345). The fix ensures child scratchpads always delegate to parent for null_resume, preventing the double-consumption race condition. Only the parent scratchpad now 'owns' the resume value lookup.
|
Do you happen to have a test case that reproduces this? |
Yes, I've added a regression test case, test_repro_issue_6660.py, that reproduces the reproduction scenario (5 async tasks/resumes). |
|
#6660 guesses at three possible solutions:
This PR comes up with a fourth option of simply ignoring the child scratchpad's pending_writes, but only for the purposes of calculating the null_resume_write value. Which seems confusing? (Why put the NULL_TASK_ID into pending_writes only to later ignore it?) Also pending_writes is also referenced later to calculate task_resume_write, which I can't rule out having a similar bug, without fully understanding the problem in the first place. Certainly having a two-line fix is attractive, but I'm curious if it's possible to compare the relative merits of the four different approaches. Unfortunately I don't really have a mental model of what this logic is supposed to do, but (as an outside observer) I worry that the bug may be caused by the logic already being too complicated, rather than not being complicated enough. |
Fix for issue #6660.
In async mode, both parent and child scratchpads were capturing references to the same resume tuple in pending_writes. When both attempted to consume the value, it caused iterations to be skipped (e.g., output 1, 123, 12345 instead of expected 1, 12, 123, 1234, 12345).
The fix ensures child scratchpads always delegate to parent for null_resume, preventing the double-consumption race condition. Only the parent scratchpad now 'owns' the resume value lookup.
Thank you for contributing to LangGraph! Follow these steps to mark your pull request as ready for review. If any of these steps are not completed, your PR will not be considered for review.
PR title: Follows the format: {TYPE}({SCOPE}): {DESCRIPTION}
{TYPE}values:{SCOPE}values (optional):PR message: Delete this entire checklist and replace with
Add tests and docs: If you're adding a new integration, you must include:
docs/docs/integrationsdirectory.Lint and test: Run
make format,make lintandmake testfrom the root of the package(s) you've modified. We will not consider a PR unless these three are passing in CI. See contribution guidelines for more.Additional guidelines:
pyproject.tomlfiles (even optional ones) unless they are required for unit tests.