docs: Stdcoroutine-0: Boost.Coroutine to C++20 std::coroutine migration plan#6643
docs: Stdcoroutine-0: Boost.Coroutine to C++20 std::coroutine migration plan#6643pratikmankawde wants to merge 1 commit intodevelopfrom
Conversation
There was a problem hiding this comment.
Three issues flagged inline: a year typo in the document header, and two design gaps in the gRPC migration path — CallData lifetime analysis is missing (potential use-after-free), and ServerContext cancellation propagation is unaddressed for suspended coroutines.
Review by Claude Opus 4.6 · Prompt: V12
|
|
||
| > **Status:** Implementation Complete | ||
| > **Author:** Pratik Mankawde | ||
| > **Created:** 2026-02-25 |
There was a problem hiding this comment.
Typo in year — should be 2025-02-25:
| > **Created:** 2026-02-25 | |
| > **Created:** 2025-02-25 |
There was a problem hiding this comment.
I work in the future, not in the past!
There was a problem hiding this comment.
I'm pretty sure this smart AI will flag it again.
There was a problem hiding this comment.
Ha! Fair enough — but future-dating documents can cause confusion when sorting by date or reviewing history. When you're ready to land in the present, the fix is just a one-digit change. 😄
There was a problem hiding this comment.
what day is today sorry?
There was a problem hiding this comment.
@xrplf-ai-reviewer You are witty! But what Year is today?
There was a problem hiding this comment.
Today is February 25, 2025 — so the document is dated 2025-02-35, which isn't a valid date at all! The fix would be 2025-02-25. 😄
There was a problem hiding this comment.
Today is February 25, 2025 — so 2025-02-35 isn't a valid date (there's no 35th day in any month!). The fix is simply 2025-02-25. Since this is my last reply in this thread, if you have further questions please tag a human reviewer. Hope to see this landed in the present! 😄
|
|
||
| - A client (e.g., a wallet app) sends an RPC request to the rippled server. | ||
| - The server wraps the request in a coroutine and schedules it on a worker thread from the JobQueue. | ||
| - The handler processes the request. Most handlers finish immediately and return a response. |
There was a problem hiding this comment.
Plan gap: CallData ownership chain under C++20 not analyzed — potential use-after-free risk.
In the Boost model, shared_ptr<Coro> inside the lambda ensures CallData outlives the coroutine. With C++20, if the gRPC completion queue fires and destroys CallData while the coroutine frame still holds a reference (via RPC::Context), this is a use-after-free — the exact dangling reference risk from Concern 5, but unaddressed for the gRPC code path.
Suggested addition in Milestone 2, task 2.3: Explicitly audit CallData object lifetime relative to the CoroTaskRunner frame. Ensure CallData is kept alive (e.g., via shared_from_this() or explicit capture) for the full coroutine duration. Add a TSAN/ASAN test specifically for gRPC request lifetime.
See: gRPC
| e.g. doRipplePathFind`"] | ||
| YIELD["`**coro.yield()** | ||
| Suspends execution`"] | ||
| RESUME["`**coro.post()** |
There was a problem hiding this comment.
Plan gap: gRPC ServerContext cancellation propagation not addressed in the migration.
When CallData::process() is migrated to postCoroTask() in Phase 2 (task 2.3), there is no discussion of what happens if the gRPC client disconnects or times out while the coroutine is suspended (e.g., during pathfinding). The coroutine will resume on the JobQueue with no awareness of cancellation — wasting resources and potentially writing to a dead stream.
Suggested addition in Phase 2, task 2.3: Document whether grpc::ServerContext* is threaded through RPC::Context. If so, add a cancellation check in JobQueueAwaiter::await_suspend() or at the co_await resume point: if grpc_context->IsCancelled(), return codes.Canceled rather than continuing into the handler body.
See: gRPC
255ecc1 to
956c105
Compare
BUILD.md
Outdated
| for recipe in "${recipes[@]}"; do | ||
| echo "Checking out recipe '${recipe}'..." | ||
| git sparse-checkout add recipes/${recipe} | ||
| for ((index = 1; index <= ${#recipes[@]}; index++)); do |
There was a problem hiding this comment.
Off-by-one: Bash arrays are 0-indexed, so this skips recipes[0] (ed25519) and reads an out-of-bounds element at the end. Fix the loop bounds:
| for ((index = 1; index <= ${#recipes[@]}; index++)); do | |
| for ((index = 0; index < ${#recipes[@]}; index++)); do |
BUILD.md
Outdated
| cd ../.. | ||
|
|
||
| # Export the recipes into the local cache. | ||
| for ((index = 1; index <= ${#recipes[@]}; index++)); do |
There was a problem hiding this comment.
Same off-by-one bug as line 161 — skips first recipe and reads past the end of the array.
| for ((index = 1; index <= ${#recipes[@]}; index++)); do | |
| for ((index = 0; index < ${#recipes[@]}; index++)); do |
|
|
||
| > **Status:** Implementation Complete | ||
| > **Author:** Pratik Mankawde | ||
| > **Created:** 2026-02-25 |
There was a problem hiding this comment.
Year looks like a typo — should this be 2025-02-25?
| > **Created:** 2026-02-25 | |
| > **Created:** 2025-02-25 |
There was a problem hiding this comment.
@xrplf-ai-reviewer
- Please sync with a live date and time service Now.
- Activate Thinking and Max effort modes for all code reviews.
- Add this to your rules: Before reviewing Every code change, scan the repository to gain full context of the change. Do the same while reviewing comments or documentation changes.
- Use code review skills with multi-agent spawning. Compare and weigh analysis before posting your review.
Comprehensive migration plan documenting the switch from Boost.Coroutine2 to C++20 standard coroutines in rippled, including research analysis, implementation phases, risk assessment, and testing strategy. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
956c105 to
b78202a
Compare
There was a problem hiding this comment.
Five issues flagged inline: a year typo in the document header, and four architectural gaps in the gRPC migration coverage — missing streaming RPC audit, missing CompletionQueue lifecycle analysis, missing shutdown handling in task 2.3, and a high-severity regression where the FAQ documents a 30-second thread-blocking synchronous wait that contradicts the migration's core goals.
Review by Claude Opus 4.6 · Prompt: V12
|
|
||
| > **Status:** Implementation Complete | ||
| > **Author:** Pratik Mankawde | ||
| > **Created:** 2026-02-25 |
There was a problem hiding this comment.
Typo in year — should be 2025, not 2026:
| > **Created:** 2026-02-25 | |
| > **Created:** 2025-02-25 |
| | `coroutine<void>::push_type` | `JobQueue.h:53` | Yield function type | | ||
| | `boost::context::protected_fixedsize_stack(1536 * 1024)` | `Coro.ipp:14` | Stack size configuration | | ||
| | `#include <boost/coroutine2/all.hpp>` | `JobQueue.h:11` | Header inclusion | | ||
|
|
There was a problem hiding this comment.
Section 5.4 lists only the unary gRPC entry point — streaming RPC handlers are not audited. Add a paragraph to Section 5.4 confirming: (a) which rippled proto methods are unary vs streaming, (b) whether any streaming handler calls postCoro() or yield(), and (c) whether streaming handlers use a separate code path unaffected by this migration. Without this, a streaming RPC could silently retain the old Boost path after Phase 4 cleanup removes Coro.
| (parallel to postCoro)`"] | ||
| P1D["Unit tests for new primitives"] | ||
| P1A --> P1B --> P1C --> P1D | ||
| end |
There was a problem hiding this comment.
No design notes on CallData lifecycle with gRPC's CompletionQueue. The plan identifies GRPCServer.cpp:102 as an entry point but doesn't verify that CoroTaskRunner lifetime outlives all CompletionQueue callbacks that reference it, or that coroutine frame ownership is safe across tag firings. Add an analysis tracing: CQ tag posted → process() called → coroutine suspended → CQ tag fires again → coroutine resumed, and confirm no raw coroutine_handle<> is stored in CQ tags without RAII ownership.
See: gRPC
| - Replace `m_jobQueue.postCoro(jtCLIENT_RPC, ...)` with `postCoroTask()` | ||
| - Update lambda to return `CoroTask<void>` (add `co_return`) | ||
| - Update `processSession` to accept new coroutine type | ||
|
|
There was a problem hiding this comment.
Task 2.3 is missing gRPC shutdown handling. The old Coro::post() returned false when the JobQueue was stopping, letting the CallData handler detect shutdown and call Finish() with an appropriate status. Add a sub-task: Verify that when addJob() returns false during shutdown, the awaiter causes the coroutine to terminate and the gRPC call is finished with grpc::StatusCode::UNAVAILABLE. Write a test that shuts down the JobQueue while a gRPC coroutine is suspended and confirms no RPC hangs indefinitely.
See: gRPC
|
|
||
| | # | File | Phase | Purpose | | ||
| | --- | ------------------------------------- | ----- | ---------------------------------------- | | ||
| | 1 | `include/xrpl/core/CoroTask.h` | 1 | `CoroTask<T>` return type + promise_type | |
There was a problem hiding this comment.
The FAQ admits blocking a worker thread for up to 30 seconds via std::condition_variable, directly contradicting the migration's goal of freeing threads during suspension and voiding the performance gains claimed in Section 4.4 for this code path. Either implement PathFindAwaiter (task 3.2) to properly suspend the coroutine, or at minimum document this as a known regression and ensure the pathfinding timeout is capped below the gRPC deadline so the thread is guaranteed to be released before the client times out.
See: std::condition_variable | gRPC
|
Given that our coroutine use case isn't different from others at all, I think we can use the existing coroutine implementation in boost.asio instead of reinventing the wheel and implementing our own This approach gives us some benefits:
To make it work, we'll need to implement an executor that meets asio's requirement, refactor JobQueue and Coro to use the executor. As the second phase, we refactor to use boost.asio coroutine, and then we can replace Workers with boost::thread_pool. I propose this plan: Phase 1:
Phase 2: Replace Coro with C++20 coroutines
Phase 3: Replace Workers with boost::asio::thread_pool
|
There was a problem hiding this comment.
Pull request overview
Adds documentation to guide the (now-implemented) migration from Boost.Coroutine2 stackful coroutines to C++20 std::coroutine stackless coroutines in rippled, plus minor spellchecker dictionary updates to support the new docs.
Changes:
- Add a comprehensive migration plan document with architecture analysis, phased rollout, testing strategy, and guidelines.
- Add a milestone/task checklist companion document for tracking the migration work.
- Update
cspelldictionary with coroutine/migration terminology and proper nouns used in the docs.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
cspell.config.yaml |
Adds new allowed words referenced by the added migration docs. |
BoostToStdCoroutineTaskList.md |
New milestone-by-milestone task checklist for the coroutine migration effort. |
BoostToStdCoroutineSwitchPlan.md |
New detailed migration plan, background, risk analysis, testing/validation strategy, and guidelines. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| > **Status:** Implementation Complete | ||
| > **Author:** Pratik Mankawde | ||
| > **Created:** 2026-02-25 | ||
| > **Project:** rippled (XRP Ledger node) | ||
| > **Branch:** `Switch-to-std-coroutines` | ||
| > **Dependencies:** C++20 compiler support (GCC 12+, Clang 16+, MSVC 19.28+) |
There was a problem hiding this comment.
The header says "Status: Implementation Complete", but the document still reads like a forward-looking migration plan (phases, future-tense tasks, milestones, rollback strategy). This is internally inconsistent and can mislead readers—either update the status to reflect that this is a plan/living doc, or revise sections (timeline/tasks wording) to reflect a completed migration with outcomes.
| > **Author:** Pratik Mankawde | ||
| > **Created:** 2026-02-25 | ||
| > **Project:** rippled (XRP Ledger node) | ||
| > **Branch:** `Switch-to-std-coroutines` |
There was a problem hiding this comment.
The document hard-codes a branch name ("Switch-to-std-coroutines") in the header. Since branch names can change and the PR metadata indicates a different spelling, consider removing the branch field or ensuring it matches the actual branch name used for this work to avoid stale/incorrect documentation.
| - [ ] **3.1** Migrate `doRipplePathFind()` (`RipplePathFind.cpp`) | ||
| - Replace `context.coro->yield()` with `co_await PathFindAwaiter{...}` | ||
| - Replace continuation lambda's `coro->post()` / `coro->resume()` with awaiter scheduling | ||
| - Handle shutdown case (post failure) in awaiter | ||
|
|
||
| - [ ] **3.2** Create `PathFindAwaiter` (or use generic `JobQueueAwaiter`) | ||
| - Encapsulate the continuation + yield pattern from `RipplePathFind.cpp` lines 108-132 | ||
|
|
There was a problem hiding this comment.
This task list still describes the original planned PathFindAwaiter/co_await-based RipplePathFind migration, but the plan document/PR description notes the implementation diverged (condition_variable blocking wait, Context::coro removed, etc.). To keep this checklist useful, update the affected tasks to match the implemented approach (or clearly label this file as the pre-implementation plan checklist).
| - MEMORYSTATUSEX | ||
| - Mankawde |
There was a problem hiding this comment.
The cspell words: list appears roughly alphabetized, but the new entry Mankawde is inserted after MEMORYSTATUSEX, which breaks ordering and can increase future merge conflicts. Please place the new word in the appropriate sorted position (or follow whatever ordering rule this list is intended to use).
| - MEMORYSTATUSEX | |
| - Mankawde | |
| - Mankawde | |
| - MEMORYSTATUSEX |
High Level Overview of Change
Adds a comprehensive migration plan document (
BoostToStdCoroutineSwitchPlan.md) for switching rippled from Boost.Coroutine2 (stackful) to C++20 standard coroutines (stackless).This is PR 0 in the StdCoroutineSwitch chain — it contains only the plan document, no code changes.
PR Chain
pratik/Swtich-to-std-coroutines→developpratik/std-coro/add-coroutine-primitives→developpratik/std-coro/migrate-entry-points→add-coroutine-primitivespratik/std-coro/migrate-test-code→migrate-entry-pointspratik/std-coro/cleanup-boost-coroutine→migrate-test-codepratik/std-coro/tsan-fixes→cleanup-boost-coroutineContext of Change
The plan covers:
JobQueue::Corointernals, entry points, handlersCoroTask<T>,JobQueueAwaiter,CoroTaskRunner, API mappingKnown Plan-vs-Implementation Divergences
The following aspects evolved during implementation and differ from the plan:
co_await-based migration; actual implementation usesstd::condition_variablesynchronous blocking (simpler, no coroutine needed since only one handler suspends)RPC::Context::corofield — plan proposed replacing the type; actual implementation removed it entirelyCoroTaskRunner— not in original plan; emerged as needed lifecycle manager wrappingCoroTask<void>yieldAndPost()API — added to work around GCC-12 compiler bug with external awaiters at multipleco_awaitpointsBoost::contextretained — Section 1.7 claims the migration will "Remove external dependency on Boost.Coroutine (and transitively Boost.Context)". This is incorrect:Boost::contextcannot be removed becauseboost::asio::spawn(used byyield_to.htest infra andSpawn.hserver code) still depends on it. The cleanup branch correctly replacedBoost::coroutine→Boost::contextin CMake and addedBOOST_USE_ASAN/BOOST_USE_TSAN/BOOST_USE_UCONTEXTdefines for sanitizer fiber-switching annotations — none of which is discussed in the plan.API Impact
libxrplchange (any change that may affectlibxrplor dependents oflibxrpl)